Statistical phrase extraction techniques in natural language databases.

$0Z01FY2000LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Paper 18834487 Paper 18817555 Paper 18080004 Paper 16867190 Paper 16843731 Paper 16779069 Paper 15556479 Paper 15130538 Paper 15073016 Paper 12798042 Paper 11079836 Paper 10984469

Abstract

The ability to locate important phrases in natural language text is useful for the purposes of indexing or placing hyperlinks in text. In either case one seeks to improve access to the textual material. In the past the most common method used for the location of phrases has been a part of speech tagger. We have developed a new approach that uses a number of scoring algorithms to rank phrases as to how useful they may be. Eight different methods have been developed and tested. They have proved effective in ranking known phrases from the Unified Medical Language System developed by the National Library of Medicine high among all the phrases obtained from subsets of the Medline document collection. Six of the methods have been combined to produce optimal scoring methods and have proven useful in extracting material of quality similar to that already in the UMLS. They also appear promising as a way to mark text with hyperlinks for navigation purposes. Two papers are being published on this topic and the methods are being applied to the electronic text book project at NCBI.

View original record on NIH RePORTER →