SBIR Phase I: Xtractica: A System for Extracting Coherent Data from Documents
Xsb, Inc., New York NY
Investigators
Abstract
This Small Business Innovation Research (SBIR) Phase I project will investigate the feasibility of designing and building a software system: Xtractica. This software system will allow domain experts to specify programs that transform unstructured or partially structured data from a variety of document sources, such as World Wide Web sites, PDF files and text into structured, coherent and readily usable information. Xtractica will consist of a set of tightly integrated powerful syntactic and semantics-driven data extraction technologies that are managed from a graphical user interface to retrieve information that was created for human understandability, and extract and reason about it to create knowledge that can support automated decision making and transactions. An important feature of Xtractica is that users can rapidly create extractors by simply supplying examples of the data to be extracted. Thus it will empower users who are knowledgeable about their application domains but are not necessarily trained as computing technologists, to structure data into knowledge. The Phase I project will develop the operational specifications of Xtractica and determine its feasibility by prototyping its critical components. Phase 2 will then produce a fully functional Xtractica system based on results from Phase I. Finally Phase 3 will make Xtractica commercially available to clients with diverse business interests including content aggregation, e-procurement, ERP and supply chain management vendors.
View original record on NSF Award Search →