GGrantIndex
← Search

SBIR Phase II: Information Extraction from Synthetic Procedures

$781,919FY2001TIPNSF

Intellichem Inc., Bend OR

Investigators

Abstract

This Small Business Innovation Research (SBIR) Phase II project is directed at developing a collection of software tools for use in selective extraction of information from the running text of synthetic recipes. Synthetic procedures are batch recipes used in the creation and discovery of new chemical entities for drug discovery. The ultimate aim of the project is to automate information extraction and place the information in a computer-understandable data structure that fully captures the data and semantics of the synthetic recipe. The Phase I program successfully demonstrated feasibility of the approach by constructing a prototype system and using it to solve a range of representative synthetic-recipe-related information extraction problems. In Phase II, the objectives are to (1) refine and extend the features of the prototype system; (2) implement machine learning capability for extraction rule induction, (3) construct focused demonstration applications, and (4) test, evaluate and validate the software system in conjunction with pharmaceutical-company research-collaborators. The ultimate goal of the program is to develop a commercial software toolkit that enables chemists to easily construct systems for information extraction from synthetic recipes. Recipes for more than 19 million unique compounds are contained in the public literature, and there are a comparable number in the archives of pharmaceutical companies. The vast majority of these procedures are maintained as unstructured running text. Intellichem, Inc. proffers tools for extraction of synthetic recipe information into computer-understandable data structures that will benefit the following: database construction and updating, summarization, chemical process discovery, knowledge reuse, improved productivity of the chemist, and chemistry-related e-commerce.

View original record on NSF Award Search →
SBIR Phase II: Information Extraction from Synthetic Procedures · GrantIndex