ITR: Domain-Independent Semantic Interpretation
University Of Colorado At Boulder, Boulder CO
Investigators
Abstract
The goal of this project is to build a domain-independent, language-independent, statistically-trained shallow semantic interpreter. This is a sophisticated, robust, widely-available tool that takes text sentences as input and provides a `shallow' representation of the meaning of the sentence. The research addresses key scientific questions including the nature of automatically-extractable linguistic features that link surface form with semantic structure, the extent to which features and representations are robust across languages, and the combination of supervised and unsupervised learning techniques. The shallow semantic interpreter works in both English and Chinese by analyzing sentences into propositions involving predicates together with their arguments, labeled by sets of semantic roles (Agent, Patient, Instrument, Location, etc). The project relies on a statistical machine-learning paradigm, including supervised approaches in which features are extracted from two large semantically-labeled databases (FrameNet and PropBank), as well as unsupervised and lightly-supervised methods like clustering with graphical models, co-training, and active learning, including various sophisticated linguistic features. The results of the project will allow shallow semantic representations to be incorporated into an entire range of key Natural Language Processing applications, including information extraction and machine translation. The research will also provide important knowledge about the linguistic features that are cues to meaning. Finally, the project will provide key publicly-available databases for other research. These include the combined FrameNet and PropBank database, which will be a rich resource for applications using the Semantic Web and ontologies in general, as well as a richly-annotated Chinese PropBank, and aligned English and Chinese corpora useful for machine translation research.
View original record on NSF Award Search →