CAREER: Statistical Learning Theory for Natural Language Processing: Theory, Algorithms and Representations

$500,000FY2004CSENSF

Massachusetts Institute Of Technology, Cambridge MA

Investigators

Abstract

The focus of this project is the further development of statistical methods in the context of natural language processing (NLP). The project involves research in three core theoretical or algorithmic areas, and in three application areas within NLP. The core areas are supervised learning for natural language tasks; a study of the issues of efficient search and parameter estimation, particularly by exploring connections between NLP models such as weighted context-free grammars and work on search and estimation in graphical models; and finally, a study of unsupervised or partially supervised learning. The three application areas are syntactic analysis; learning in natural language interfaces; and learning approaches to information extraction. A first aim is to further develop a theoretical understanding, and new algorithms, for machine learning approaches to natural language. A second aim is to apply these algorithms to natural language problems, and in particular to develop an understanding of the representations which lead to good performance on these tasks. Improvements in core technologies for machine learning in natural language processing domains could have impact on a wide range of NLP applications. Our aim is to make it relatively simple to build a natural language interface, or information extraction system, in a new domain. Advances in our theoretical understanding of learning in language problems could also have an impact in linguistics, in studies of learnability or language acquisition. Finally, an important goal of the research and educational plan is to build connections between computational linguistics and other fields, particularly machine learning, and probabilistic representation and reasoning in artificial intelligence.

View original record on NSF Award Search →