Target-Based Document-Independent Information Extraction

$429,549FY2001CSENSF

Brigham Young University, Provo UT

Investigators

David W Embleycontact Deryle Lonsdale Douglas M Campbell Stephen W Liddle Yiu-Kai Dennis Ng

Abstract

Target-Based Document-Independent Information Extraction With ever-growing volumes of data in widely varying formats, there is a need to sift and funnel information to users to meet their own specific requirements. This project addresses the challenge of finding, extracting, and delivering appropriate data by developing a versatile framework that is target-based (i.e., based on a user's description of the desired information) and document-independent (i.e., robust, not failing whenever documents change or when new documents of interest are encountered). A combination of document-related clues regarding textual content as well as geometrical and organizational layout enables processing across various document formats. Developers and users specify areas of interest via descriptive ontologies (i.e., declarations of information types and concept relationships). These ontologies facilitate reformulating, matching, and merging retrieved information. The result of these efforts will be a comprehensive infrastructure to extract expertly, organize automatically, and summarize succinctly critical information in a queriable personalized view. An online repository will contain research results, downloadable software (including source code), and a Web interface enabling user access to the various tools and engines developed. Potentially, this technology can be embedded in personal agents; leveraged in customized search, filtering, and extraction tools; and used to provide tailored views of data via integration, organization, and summarization. http://www.deg.byu.edu

View original record on NSF Award Search →