SBE-RCUK: CompCog: Modeling the Development of Phonetic Representations

$520,058FY2017SBENSF

University Of Maryland, College Park, College Park MD

Investigators

Abstract

Listeners' processing of speech is tuned to their native language. For example, Japanese listeners categorizing English [l] and [r] do not rely on the same aspects of the speech signal that native English listeners do. This project uses computational models to investigate how children develop language-specific perceptual strategies. A better understanding of this perceptual learning process could lead to better diagnosis and treatment of developmental language impairments that have a perceptual basis and can provide insight into the difficulties that listeners face when learning a second language in adulthood. Building computational models of how children learn their native language from the speech around them can also lead to improved speech technology for low-resource languages (languages that are not spoken by many people in the world or that lack digital resources such as large-scale, annotated databases), ultimately leading to systems that learn more effectively using little or no transcribed audio. Such systems could become important tools for documenting and analyzing endangered and minority languages and could help make speech technology more universally available. A series of simulations tests the hypothesis that children's processing of speech can become specialized for their native language through a process of dimension learning that does not rely on knowledge of sound categories. Two models that use dimension learning are proposed, drawing on representation learning methods that have performed well in low-resource automatic speech recognition, where extensive labeled training data are not available. The first model relies on temporal information as a proxy for sound category knowledge, while the second model relies on top-down information from similar words, which infants have been shown to use. Each model is trained on speech recordings from a particular language and is evaluated on its ability to predict how adults and infants with that language background discriminate sounds. The research will yield new methods for training and testing cognitive models of language with naturalistic speech recordings and has the potential to significantly impact theories of how and when children learn about the sounds of their native language.

View original record on NSF Award Search →