RI: Exploiting and Exploring Discourse Connectivity: Deriving New Technology and Knowledge from the Penn Discourse Treebank
University Of Pennsylvania, Philadelphia PA
Investigators
Abstract
Large scale corpora annotated at the sentence level have played a critical role in natural language research. They have enabled large scale integration of statistical knowledge (derived from the corpora) with linguistic knowledge leading to both technological and scientific applications, such as information extraction, question answering, summarization, and machine translation, among others. This approach is now being extended to the discourse level, thus going beyond the sentence level. Using a resource called the Penn Discourse Treebank (PDTB), a large scale corpus annotated with discourse structure along with the associated semantics, new major experimental work on discourse processing is being carried out, leading to the generation of more coherent summaries and texts, extraction of complex relations in texts, among others, as well as foundational research relevant to language technology. This work is also providing a deeper understanding of the relationship between sentence level and discourse level structures. While pursuing these goals, a variety of tools for making a productive use of the PDTB resource are also being developed. This research program is also coupled with a strong educational program involving training researchers in the PDTB methodology so that similar resources can be developed in other languages substantially divergent from English. This part of the research program has international components including collaboration with research groups in Czech Republic, India, and Finland. The international collaboration is funded by the NSF Office of International Science and Engineering.
View original record on NSF Award Search →