SBIR Phase II: Authoring Assistance via Contextual Semantic Labeling

$1,194,443FY2023TIPNSF

Docugami, Inc., Kirkland WA

Investigators

Abstract

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase II project comes from extracting meaningful, useful, specific information from “dark data.” Dark data are the countless documents companies produce and receive, which contain unused information – usually because they are in formats that computers do not understand. Many of these documents do not even contain accessible text: only pictures of text. Word-processing documents and emails do have text, but no information about what the text is. Computers can easily tell that “10/05/2022” is a date, but knowing it is the date a particular agreement starts or ends (or something else) is needed to make it useful. This project uses a range of artificial intelligence (AI) techniques that work in real time while people are writing new documents or extracting data from old documents. The AI learns quickly from examples, finds patterns across similar documents, and uses that learning to save the user from having to search for items again and again in varying contexts. This saves a lot of tedious work and reduces errors. The extracted information helps companies understand, analyze, and make business decisions. This Small Business Innovation Research (SBIR) Phase II project identifies and extracts useful information items from long natural language documents, especially contracts and agreements. The technology identifies items much more specifically than typical extraction methods; for example, not only as person, organization, or place names, but as to what role each plays. Likewise, addresses, dates, money amounts, and other data items only become useful when you know what they’re for. This is a valuable focus for advancing Natural Language Understanding. The team combine and extend Machine Learning technologies such as few-shot learning, fine-tuning, and semantic parsing to achieve these stronger, more “semantic” results. This solution allows companies to generate value from huge troves of information they already collect but cannot yet automate or leverage. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →