RI-Small: Improving Machine Learning Approaches to Coreference Resolution

$250,000FY2008CSENSF

University Of Texas At Dallas, Richardson TX

Investigators

Abstract

Despite recent successes in machine learning approaches to coreference resolution, the performance of recently developed coreference systems is plateauing. Progress on the coreference task is currently limited in part by the over-reliance on morpho-syntactic knowledge sources, which has led to the induction of overly simplistic coreference heuristics. To bring learning-based coreference resolvers to the next level of performance, this work adopts a knowledge-rich approach, investigating a variety of semantic and discourse knowledge sources for coreference resolution. Specifically, it develops corpus-based methods for inducing semantic features, leveraging publicly available lexical databases and advanced machine-learning and inference techniques. In addition, it broadens the kind of discourse knowledge exploited by existing learning-based coreference systems, generating features for capturing not only the salience of a noun phrase but also the coherence of a text, via the use of discourse segmenters and parsers that are grounded in Centering Theory, Rhetorical Structure Theory, and Grosz and Sidner's discourse theory. The main results of this work will be to demonstrate the benefits of a knowledge-rich approach to learning-based coreference resolution, and to re-introduce the linguistically motivated discourse theories developed in the late 1970s and early 1980s into statistical coreference resolution models. The data sets produced in this research will be made available to the research community, and the experimental results will be disseminated via publications.

View original record on NSF Award Search →