Collaborative Research: The Individual Differences Corpus: A resource for testing and refining hypotheses about individual differences in speech production

$103,992FY2023SBENSF

Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI

Investigators

Abstract

Speaking is a surprisingly personal activity. When humans articulate the vowels and consonants that make up words, or produce the rhythm and melody of a sentence, they do so in ways that differ from each other – even from others who speak the same language. And while some of the differences in a person’s pronunciation reflect the specific region they come from or the social group they belong to, most of the variation in people’s speech remains poorly understood. This lack of understanding presents challenges to society – surprisingly practical ones, such as how speech and language disorders can be diagnosed (for example, which kinds of differences are pathological and which kinds aren’t?) as well as how technological applications such as automatic speech recognition operate (for example, which kinds of differences cause problems for a speech recognition system and which ones don’t?). The goal of this project is to produce a corpus of speech data – the first-ever publicly-available corpus of its kind – that can be used to explore the ways and reasons that people differ in their speech patterns. The corpus – the Individual Differences Corpus – includes tens of thousands of words produced by hundreds of native English speakers, providing researchers with the data needed to test scientific hypotheses about how a range of mental skills (e.g., memory, attention) and personality characteristics (e.g., autistic traits, empathy) influence people’s speech, with implications for how researchers approach speech-related differences in social, educational, technological and clinical contexts. Speech signals are rife with variation. Some of this variation derives from the form of the message itself (i.e., effects of phonetic and/or phonological context), while some derives instead from the speaking context (e.g., the need to produce faster, clearer or less ambiguous speech). However, some of the variation found in speech has its origins in speakers themselves – i.e., individual differences. But what aspects of speakers and listeners cause them to vary, and what can they tell us about the language and speech production systems? The present research aims to create the Individual Differences Corpus, a publicly-available corpus resource designed for approaching questions about individual differences in speech production. The corpus is unique in that it pairs (1) thousands of words of connected speech produced by hundreds of native English speakers with (2) a large battery of measurement of all speakers’ cognitive and social profiles, including psychometrically valid measurements along several dimensions of cognitive control (e.g., working memory, processing speed, inhibition), cognitive processing styles (e.g., autistic traits, empathy) and more. The theoretical and empirical potential of the corpus is demonstrated in two psychometric studies of speech production planning that investigate planning from both prosodic and segmental perspectives. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →