GGrantIndex
← Search

Statistical phrase extraction techniques in natural language databases.

$0Z01FY2000LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Abstract

The ability to locate important phrases in natural language text is useful for the purposes of indexing or placing hyperlinks in text. In either case one seeks to improve access to the textual material. In the past the most common method used for the location of phrases has been a part of speech tagger. We have developed a new approach that uses a number of scoring algorithms to rank phrases as to how useful they may be. Eight different methods have been developed and tested. They have proved effective in ranking known phrases from the Unified Medical Language System developed by the National Library of Medicine high among all the phrases obtained from subsets of the Medline document collection. Six of the methods have been combined to produce optimal scoring methods and have proven useful in extracting material of quality similar to that already in the UMLS. They also appear promising as a way to mark text with hyperlinks for navigation purposes. Two papers are being published on this topic and the methods are being applied to the electronic text book project at NCBI.

View original record on NIH RePORTER →