EAGER: Formal and Empirical Foundations of Semantics-Preserving Machine Translation

$149,896FY2013CSENSF

Johns Hopkins University, Baltimore MD

Investigators

Abstract

Statistical machine translation has been enormously successful over the last two decades, resulting in what is today a thriving industry highlighted by offerings such as Google Translate. Yet translation systems still often fail to preserve the semantics of sentences -- the "who did what to whom" relationships that they express. This is because they model translation as simple substitution and permutation of words, or at best as the reordering of syntactic units, such as nouns and adjectives. To preserve semantics, they must model semantics. At the same time, computational linguists have developed rigorous, expressive mathematical models of language that exhibit high empirical coverage of semantically annotated linguistic data, correctly predict a variety of important linguistic phenomena in many languages, and can be processed with highly efficient algorithms. However, these models are untested as the basis of statistical translation models. This EArly Grant for Exploratory Research aims to close the gap, building the foundations of empirical semantics-preserving transduction models based on modern, linguistically-informed mathematical models of language. The project derives new mathematical functions that map linguistically expressive representations from one language to another, and implement them to align translated documents and translate new documents. Though high-risk, this exploratory project has the potential to unify and transform the disparate fields of empirical machine translation and theoretical computational linguistics.

View original record on NSF Award Search →