EAGER: Exploring and Linking Widely Distributed Data on the Semantic Web

$197,407FY2011CSENSF

Rensselaer Polytechnic Institute, Troy NY

Investigators

Abstract

The goal of this project is to explore key algorithms, technologies and protocols that will lead to the next level of development of the original Semantic Web vision of the 'Web of Data', a Web in which the unstructured texts of the current Web are integrated in a seamless way with information currently locked in structured databases. While a huge amount of open data is being made available on the Web, especially in the 'Open Government Data' arena, traditional computing techniques are inadequate for finding this data, for linking it to other data, and for reusing and repurposing the data resources. The project aims to show that an innovative combination of Semantic Web technologies will provide the basis for a new approach to large-scale, on-line, data integration and use. The research team will demonstrate our techniques by showing their efficacy on a combination of Open Government datasets being released around the world. There are already hundreds of thousands of these databases made available in machine-readable formats by countries, municipalities and cities, and the number is growing exponentially. This makes Open Government Data a large-scale testbed for Web-based data integration. The research team has collected the metadata for close to 400,000 datasets from more than 60 catalogs, from 20 countries, which are published in fourteen different languages. The project will show how the combination of linked-data representations, machine-readable metadata and Semantic Web ontologies will provide an ability to federate data across these catalogs, domains, and cultures. The researchers will develop the foundational algorithms that make it possible for researchers to find, access, integrate and analyze ad hoc combinations of these many datasets integrated on the fly. Thus, the outcome of this project will be to demonstrate techniques, and develop a proof-of-concept demonstration, showing that the integration of multiple data sources across the Web can be accomplished by the application of a combination of semantic information of different kinds. The researchers will show that it is possible to build search and reuse tools that function across large distributed data collections, and we will explore the key research challenges in creating Web-scale linked-open-data repositories. The success of this project will demonstrate that by bridging the gap between structured and unstructured sources, it is possible to develop techniques that set the stage for a second generation of more powerful Semantic Web tools. Such tools will allow scientists, engineers, and eventually end users to perform a range of analyses without needing the large proprietary data resources currently available to only a small set of researchers working in companies with access to 'big data'. Additional information about the project can be found at: http://data.rpi.edu

View original record on NSF Award Search →