GGrantIndex
← Search

Statistical Phrase Extraction Techniques In Natural Lang

$0Z01FY2004LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Abstract

The ability to locate important phrases in natural language text is useful for the purposes of indexing or placing hyperlinks in text. In either case one seeks to improve access to the textual material. In the past the most common method used for the location of phrases has been a part of speech tagger. We have developed a new approach that uses scoring algorithms to rank phrases as to how useful they may be. A number of different methods have been developed and tested. These are being combined with methods of stemming and of finding inflectional variants of phrases that are synonymous for retrieval purposes. The UMLS system is also being used to find synonymous phrases for indexing. These methods are being applied to find useful phrases in NCBI's electronic textbook project that is currently online but still under development. The methods are also beginning to be applied to the PubMedCentral database of journal articles in biology and medicine and to the indexing of OCR material from the scanning of back issues of journals for this database.

View original record on NIH RePORTER →