Dynamics of Vocal Tract Shaping

$434,750R01FY2010DCNIH

University Of Southern California, Los Angeles CA

Investigators

Linked publications & trials

Abstract

DESCRIPTION (provided by applicant): The long-term goal of this project is to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of vocal tract constriction actions in order to understand the cognitive control and production of the compositional units of spoken language. We have developed the use of real time MRI to illuminate the inherently dynamic speech production process. Our approach is to observe the time-varying changes in vocal tract shaping and to understand how these emerge lawfully from the combined effects of multiple constriction events distributed over space (subparts of the tract) and over time. An understanding of dynamic vocal tract actions as fundamental to linguistic organization will do much to add to the field's current-basically static-approach to describing speech. In the previous (and first) funding period of the proposal, our team developed and refined our novel real time MRI acquisition ability that has made veridical real-time movies of speech production possible for the first time without X-rays. Data show clear real-time movements of the lips, tongue, and velum, providing exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract. We also developed novel noise-mitigated image-synchronized strategies to record speech in-situ during imaging as well as signal processing strategies for deriving linguistically-meaningful measures from the data (Bresh and Narayanan, 2009). We have demonstrated the utility of this approach for linguistic studies of speech communication that were hitherto not possible (e.g., Byrd, Tobin, Bresch, Narayanan, 2009;Bresch et al, 2008). Building on these foundational efforts, we situate the specific research aims of our competing renewal proposal as follows. The specific aims of this proposal are to further develop the technology and analysis platform of real-time MRI, which provides the scaffolding for the project, while pursuing speech production studies with an overarching theme of examining the decomposition of speech into cognitively-controlled action units, or gestures. Specifically, we aim to investigate the compositionality of speech in three domains-each being areas of study that are not approachable using exclusively acoustic speech data without direct access to the dynamic information from the entire vocal tract, which can only be supplied with real-time MRI. Our specific aims examine (i) compositionality in space: deployment of concurrent constriction events distributed spatially, that is, over distinct constriction effectors within the vocal tract, (ii) compositionality in time: deployment of constriction events distributed temporally, (iii) compositionality in cognition: deployment of constriction events during speech planning that mirror those observed during speech production. We propose to use the real-time MRI approach we've developed to advance our understanding in all these three aspects of linguistic structuring. Our approach to decomposing speech shaping into multiple discrete events, in space and over time, can be further validated by demonstrating that we can capture the observed data time-functions using a computational model having only discrete gestural input. To do this, we will employ a computational implementation of Articulatory Phonology and Task Dynamics (called TaDA). The model is particularly appropriate because it provides a hypothesized ensemble of gestures arrayed over time for any input utterance. The model is biologically plausible and produces as its output explicit time-functions of constriction events in the vocal tract, which is precisely what we measure directly with real-time MRI. We anticipate a highly synergistic relation between model and data that can bootstrap our understanding of the structure of speech. The model has not, to this point, been optimized using real data, as the appropriate data did not exist before real-time MRI. And, in turn, the use of real-time MRI as a tool in understanding speech depends on having an analytical procedure for relating the observed shaping changes to underlying (multiple) controls, which is what the model provides. The project's final specific aim is to continue to advance our technical real-time MRI approach for investigating the physical realization of phonological structure by: (i) improved image signal to noise ratio through the use of a novel custom 16 receiver head neck coil, (ii) doubling the 2D acquisition frame rate through the use of novel pulse sequences in conjunction with new joint acquisition-processing optimization, and (iii) fast 3D imaging using more sophisticated pulse-sequences to supplement the single plane fast imaging work. These challenges will be pursued in tandem with the design of data-driven analyses suitable for distilling the high-dimensional information provided by real-time MRI and with the synchronized acoustic speech signal, critical for deriving linguistically-meaningful measures. Specifically, we pursue robust and faster image segmentation and articulatory tracking, and methods for dynamical modeling using the derived time series constriction data.

View original record on NIH RePORTER →