RI: Small: Modeling Prosody for Speech-to-Speech Translation

$632,000FY2024CSENSF

University Of Texas At El Paso, El Paso TX

Investigators

Abstract

Machine translation research has made astounding progress, from text-to-text, to speech-to-text, and most recently, to speech-to-speech. However the latter only works well for certain use cases. This project will enable the development of systems able to better support people not only talking at but also talking with speakers of different languages. Specifically, it will focus on the aspects of communication beyond words, including pitch and other prosodic features. The ultimate outcome will be speech-to-speech translation systems that support deeper communication among people, both across national boundaries and within our language-diverse nation, empowering individuals and strengthening social cohesion. Further, by increasing knowledge of how prosody supports effective communication in dialog, this will ultimately enable language teachers and others to help people communicate better also when unaided by technology. In addition, better representations and methods for modeling prosody and the pragmatic aspects of language will enable artificial intelligence systems --- smart speakers, smartphones, smart cars, robots, and so on --- to better support their users, in more contexts and in more languages. More technically, this project will address issues in prosody, as a major barrier to widening the utility of speech-to-speech translation. Despite substantial research on prosody --- in speech technology, linguistics, and psycholinguistics --- the field lacks good models of how prosody works for pragmatic functions and knowledge of how it can most effectively be modeled. While prosody modeling is an active area of research, especially for speech synthesis, there are no generally useful computational models of prosody as it serves pragmatic functions. While accurately describing the prosody of various languages is also an active area of research, there are no quantitative models of how prosodic forms and functions map across languages. Accordingly, the aims of the proposed project are to advance knowledge of how prosody relates across languages and the ability to use machine learning to model this. The driving goal will be the construction of models that take as input an utterance of one language and predict appropriate prosody for the translation in a second language, initially for Spanish and English. Supporting activities will include corpus development, development of ways to measure pragmatic fidelity, organizing a shared task, descriptive case studies of the two languages, and user studies. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →