Time Course of Spoken Word Recognition

$250,058R01FY2015HDNIH

University Of Rochester, Rochester NY

Investigators

Linked publications & trials

Abstract

DESCRIPTION (provided by applicant): Adults can recognize between 50,000 and 100,000 words, spoken by different talkers in varying acoustic environments. This ability is remarkable because speech unfolds as a series of transient acoustic events extending over a few hundred ms, without reliable cues to word boundaries. Despite the ubiquitous success of spoken language processing among the general population, 5 percent of first graders enter school with some type of speech sound disorder that cannot be accounted for by hearing impairment. In addition, once the language system has been successfully acquired, it is susceptible to insult from injury or stroke (accounting for 1 million adults in the U.S. with some form of aphasia). Spoken word recognition plays a central role in language acquisition and spoken language comprehension, allowing for storage of the rich array of syntactic, semantic and pragmatic knowledge that is linked to lexical representations and rapid access to this information during comprehension. A more complete understanding of the perceptual and computational capacities underlying spoken word recognition is essential to advancing understanding of both normal and deviant language acquisition and processing. Because the speech signal unfolds over time and the acoustic realization of a word varies with its local environment, it is important to evaluate spoken word recognition at a fine temporal grain using words embedded in continuous speech. This project has established visual world eye tracking as a powerful tool for examining spoken word recognition. The methods that we have developed are increasingly being used to address questions about spoken language processing in participant populations across the lifespan from infants to older adults, and in normal and impaired populations. The proposed research has two aims. The first aim is to evaluate a data explanation framework in which processing words in continuous speech is modulated by expectations based on context which (a) affect how listeners interpret the input and (b) provide a mechanism for rapid perceptual learning/adaptation. We manipulate speech rate and discourse-based information structure to examine how expectations affect real-time integration of asynchronous cues and how cue-weights are adjusted through perceptual learning. The second aim focuses on three emerging questions that affect the design, interpretation and analysis of visual world experiments: (1) Are the earliest signal-driven eye movements to pictures (at least partially) mediated by phonological information from displayed pictures or are eye-movements primarily mediated by perceptual representations activated by the spoken word; (2) What is the minimal lag between cues in the speech signal and the first stimulus-driven fixations; and (3) Are fixations affected b state-dependencies, and if so, under what conditions, and how can these effects be modeled within an event-based statistical framework.

View original record on NIH RePORTER →