SGER: Enriching Parser Output for Treebank Construction
University Of Pennsylvania, Philadelphia PA
Investigators
Abstract
The construction of treebanks for linguistic and natural language processing (NLP) research has become more and more widespread over the past decade, beginning with the Penn Treebank and the Penn-Helsinki Parsed Corpora of Historical English and now extending to corpora of other languages, both modern and historical. The methods used in the construction of these treebanks are partially automated but require extensive manual correction, leading to a slow rate of production and a certain level of inconsistency in the output. The present project arises out of the urgent need for treebanking efforts to produce more accurate output and to do so more rapidly. With National Science Foundation support, Dr. Anthony Kroch, Dr. Seth Kulick and Dr. Mitch Marcus will improve the automated tools for corpus construction, applying recently developed techniques to enrich parser output while preserving bracketing accuracy. The primary goal is rapid deployment for treebanking but also to improve the descriptive adequacy of NLP technology on a more fundamental level. These more fundamental improvements should have important implications for increasing the power and practical utility of the technology in a range of applications beyond treebanking itself. The fundamental intellectual merit of this proposal is that it will extend the power of current methods of linguistic research and its broader impacts lie in the envisaged improvements to natural language technology for such practical applications as information retrieval and machine translation.
View original record on NSF Award Search →