GGrantIndex
← Search

III: Medium: Better Information Integration through Uncertainty

$1,184,825FY2009CSENSF

Stanford University, Stanford CA

Investigators

Abstract

This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5). The problem of providing seamless, integrated querying over multiple interrelated sources of information has been plaguing the database, information management, information retrieval, and artificial intelligence research communities for decades. There have been successful lines of research addressing specific components of the data integration problem, and large "one-off" systems have been built that successfully integrate specific information sources in specific domains. However, a completely general solution to the data integration problem is thought not to be realizable. The investigator is developing a new type of information integration system that is both novel and realizable. It is based on the following premises and components. 1) The system provides a general data integration solution targeted for a certain type of environment: when multiple sources have joining, overlapping, and potentially conflicting information about the same or closely related real-world entities. 2) The system permits and exploits uncertainty as an integral part of data integration: Uncertainty may be present in source data, source schemata, the integration process, integrated schemata, and integrated data. In fact, uncertainty can play a key role successful information integration. 3) The system relies on general-purpose entity-resolution as a fundamental building block of the integration process and the integrated information. Furthermore, the system retains both the uncertainty and the lineage associated with the entity-resolution process. 4) The system incorporates powerful lineage capabilities, tracking where, when, and how data was produced, how it has evolved over time, and how it has been combined and manipulated as part of the integration process. Lineage is used to enhance the integration process, and it is offered to the end-user in a variety of forms for data understanding and conflict-resolution purposes. Further information on the project can be found at the project web page: http://infolab.stanford.edu/udi/

View original record on NSF Award Search →