III: EAGER - Expressive Scalable Querying over Integrated Linked Open Data
Wright State University, Dayton OH
Investigators
Abstract
Linked Open Data (LOD) is rapidly developing into an open data movement to connect a large variety of data across the World Wide Web using standards adopted by the World Wide Web Consortium (W3C). Driven by researchers, government agencies and companies, the resulting Web of Data has grown to over 25 billion RDF triples and is showing exponential growth. However, simply putting collections of data on the Web will be of very limited value. The key to unlocking the value for developing more powerful search, browsing, exploration and analysis is to richly interlink or semantically integrate components of LOD. Given the size, growth rate, heterogeneity and growing areas of coverage, manual semantic integration or interlinking is not practical. Furthermore, current techniques focus on 'same-as' relationship, which is much abused due to limited expressivity. This calls for ways to represent and identify richer and more explicit relationships between different entities that reflect the richness of relations that exist in the real world. This project develops exploratory techniques to richly interlink components of LOD and then addresses the challenge of querying the LOD cloud, i.e., of obtaining answers to questions which require accessing, retrieving and combining information from different parts of the LOD cloud. Techniques for overcoming semantic heterogeneity include: semantic enrichment through Wikipedia bootstrapping; semantic integration through abstraction by means of upper-level ontologies; and, massively parallel methods for tractable ontology reasoning. Specifically, this research will: (1) identify richer, broader, and more relevant relationships between LOD datasets at instance and schema level (these relationships will promote better knowledge discovery, querying, and mapping of ontologies); (2) realize LOD query federation through an upper level ontology; and, (3) enable access to implicit knowledge through ontology reasoning. The project involves significant risk as it treads new paths in a new terrain, primarily due to the lack of descriptive information (schema) about the data provided by highly autonomous data sources, the significant syntactic and semantic heterogeneity among data originating from independent data sources, and the significantly larger scale, as well as unforeseeable obstacles associated with a rapidly changing and expanding environment. This project aims to advance the state of the art in semantic integration of large amounts of heterogeneous and autonomously developed or managed data. It seeks to fundamentally transform the landscape of LOD usage because successful LOD querying is a key enabler for a variety of applications. The results of this project could set the stage for the development, and the far reaching adoption, of Semantic Web. The project is integrated with education and research-based advanced training of graduate and undergraduate students. Additional information about the project can be found at: http://knoesis.org/research/semweb/projects/ESQuILO.
View original record on NSF Award Search →