CAREER Hybrid methods for acquisition and tuning of lexical information

$500,000FY2004CSENSF

Ohio State University Research Foundation -Do Not Use, Columbus OH

Investigators

Abstract

Broad coverage dictionaries and ontologies for natural language processing (NLP) are difficult and costly to create and maintain by hand. It is therefore desirable to learn them from distributional information, such as can be obtained from unlabeled or sparsely labeled text corpora. Many linguistic and psycholinguistic theories are distributional, but emphasize local neighborhood structure more than do previous NLP approaches. Successful visualization techniques such as keyword-in-context also rely on the preservation of neighborhood structure. A similar emphasis is present in emerging techniques for data reduction, such as LLE and min-cut algorithms, whose application to language data the project is investigating. While the immediate goal of the project is to gain a better understanding of lexical tuning and acquisition, the resulting dictionaries, ontologies and mapping techniques have the potential to help information professionals (such as librarians, translators, patent examiners and paralegal researchers) to navigate through corpora, to understand the significance of the data that they see, and to incorporate insights derived from the data into their working practice. The PI and his students are also integrating computational linguistics into the undergraduate curriculum of the Department of Linguistics and creating new courses designed primarily to appeal to students majoring in the humanities in order to offer such students fresh options in meeting the scientific, mathematical, and quantitative components of the university's breadth requirement.

View original record on NSF Award Search →