RI Core: Medium: Structured variability in vocal tract articulation dynamics in speech

$1,200,000FY2023CSENSF

University Of Southern California, Los Angeles CA

Investigators

Shrikanth S Narayanancontact Dani Byrd Krishna S Nayak Louis M Goldstein

Abstract

Humans produce and use speech to communicate and interact with one another, to convey their thoughts and to their express emotions, in a vast variety of ways. The production of the rich sounds of speech by an individual involves intricate movement and coordination of the vocal organs, such as the tongue and jaw, in a manner that is flexible and adaptive to the context of the interaction. Yet, the details of how this flexibility is achieved are not known completely. Speech production can also be affected by a variety of personal circumstances including illness and disorder. The project will create a scientific foundation for understanding how human speech varies across time, both within and across interpersonal interactions––over hours, days, weeks, months and years––by directly observing and modeling articulation during speech. Such knowledge is fundamental to both advancing speech science and to the design of robust interactive speech technologies. A longstanding goal in speech research is to understand and address the rich and pervasive variability in its production, both within and across individuals and for varied interactional contexts. Our research investigates questions not approachable via speech acoustics alone. Direct access to dynamic information on vocal tract articulation, complemented by technology and analysis advances, allow us to examine complex behavior associated with speech production variability—namely, its flexibility and stability over task and over time. The project will use advanced real-time magnetic resonance imaging (rtMRI) and computational modeling of the human vocal tract motion during speech production to understand the structure and control of spoken language communication across timescales, both within individual experience and across interpersonal interactions, offering an unprecedented opportunity to observe how humans plan and produce speech collaboratively with one another at a spatiotemporal detail not possible before. The project innovations include imaging the vocal tracts of conversing speakers simultaneously and synchronously at two sites to understand speech production behavior during a dialog, mapping how a single individual’s production speech production varies naturally and typically over hours, days, months and years, and how individual differences in speech flexibility are predictive of speaker stability. The research program integrates speech science and engineering through empirical work leveraging rich, quantitative, and dynamic articulatory rtMRI data, and will broadly share the unique data, tools and models. The project also has critical applied significance beyond speech technology, as knowledge of normative articulation and its variability can impact the assessment and remediation of speech disorders by helping derive robust speech-based biomarkers for a variety of clinical conditions across the life span from Autism to dementia. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →