GGrantIndex
← Search

CAREER: Semantic Divergences Across the Language Barrier

$549,930FY2018CSENSF

University Of Maryland, College Park, College Park MD

Investigators

Abstract

Despite the explosion of online content worldwide, much information is currently isolated by language barriers. While multilingual users and translators can help, the diversity and scale of online content make it impossible for humans alone to break the language barrier. Automated tools are needed to support and supplement their work. This project introduces computational representations and methods to compare and contrast the meaning of text in different languages. The resulting models will be useful to develop language technology that can support cross-lingual communication, and cross-cultural understanding, including and augmenting machine translation, by providing support for second language learners, volunteer translators, and security analysts. This CAREER project integrates research with education by using activities motivated by the practical problem of translating Wikipedia to illustrate the challenges of language technology developed on inevitably biased data. These activities target high-school and undergraduate students outside of computer science, as well as computer scientists of diverse backgrounds at the undergraduate and graduate level. Cross-lingual work in natural language processing currently relies on the assumption that a source text and its translation are equivalent in meaning in the two languages, and that they can be decomposed into smaller equivalent units by aligning sentences, phrases and words. Yet, content conveyed in two languages is rarely exactly equivalent: the same topics or events can be discussed from widely different perspectives, and even faithful translations can be hard to understand without the appropriate linguistic and cultural background knowledge. Building on and connecting distinct bodies of work on machine translation and semantic analysis, this project provides techniques to detect and explain nuanced differences between words and sentences in different languages. We characterize semantic divergences, differences in meaning across languages, using an expressive set of semantic relations between words and sentences. We use the resulting models to improve machine translation quality, and to explain translations to readers of various backgrounds.

View original record on NSF Award Search →