RI: Small: Low-Latency and High-Quality Simultaneous Translation

$465,980FY2020CSENSF

Oregon State University, Corvallis OR

Investigators

Abstract

Simultaneous language translation (interpretation) is widely used in many situations including multilateral organizations such as the United Nations, international summits and conferences, and legal proceedings. However, the concurrent perception and production in two languages makes this task extremely challenging and exhausting for humans. The number of professional simultaneous interpreters is extremely limited worldwide, and they have to work in groups of two or three where each interpreter can only sustain for about 15-30 minutes. Therefore, there is a critical need to develop simultaneous translation techniques to reduce the burden of human interpreters and make this service more accessible and affordable. However, simultaneous translation is also notoriously difficult for machines and accomplishing it consistently and reliably is considered one of the holy grails of Artificial Intelligence. Various methods have been proposed to solve this problem, but with three major limitations: (a) their translation model is still a full-sentence translation model; (b) they cannot achieve short latencies such as "3-seconds delay" common in human interpretation; and (c) their systems are complicated and difficult to train. Therefore, this project aims to develop new algorithms, techniques, and datasets for high-quality simultaneous machine translation with minimum delay (low latency). The technologies developed by this project will make simultaneous translation more affordable and accessible, which will improve the efficiency of human communication across linguistic barriers. This project also supports STEM education of underrepresented minorities (who do not speak English natively) by recruiting them in machine translation studies. Based on the principal investigator's successful prior work, the key idea in this project is to discard the conventional full-sentence translation paradigm and the classical sequence-to-sequence framework which processes the full input sentence before starting to translate and are thus ill-suited to simultaneous translation. Instead, this project adopts a "prefix-to-prefix" framework which starts translation after processing only a few input words, mimicking human interpreters. Though extremely simple, this framework achieves low latency and high translation quality. Using this framework, this project aims to (1) Develop an algorithm to detect and fix anticipation mistakes on the fly, and explore new evaluation metrics that can work for translations with revisions; (2) Develop dynamic and flexible translation strategies to balance quality and latency; (3) Construct better training data for simultaneous translation by revising the reference translations in a parallel text to remove unnecessary reorderings; (4) Apply the prefix-to-prefix framework to incremental text-to-speech synthesis (TTS), thus completing the end-to-end simultaneous speech-to-speech pipeline, improve its quality and latency, and compare with human simultaneous interpreters. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →