Doctoral Dissertation Research: The Interaction of Pitch and Timing in the Perception of Prosodic Grouping

$7,884FY2015SBENSF

Trustees Of Boston University, Boston

Investigators

Abstract

Speakers break the otherwise continuous stream of their speech into smaller, meaningful segments, the edges of which are marked by audible cues such as pauses, rate changes and pitch movement. Prosodic boundaries, as these segment edges and the cues that mark them, are known to play a role critical to language processing and spoken language acquisition. While great progress has been made documenting the range of cues that mark boundaries, much is not understood about the cognitive processes listeners use to make sense of these cues in interpreting the speech stream. The signaling of a boundary includes multiple cues from timing and pitch. Current models of prosodic boundaries, such as are used in spoken language processing and text-to-speech (TTS) systems, rely heavily on timing cues. Pitch cues are typically considered to merely support timing cues, and are even considered redundant. However, growing evidence suggests that pitch and timing interact in perception, including speech and non-speech research demonstrating pitch-based distortions of perceived duration. This dissertation project seeks to enhance our knowledge of the psychological processes involved in boundary perception through empirical work on the perceptual interaction, integration and weighting of acoustic cues that have typically been measured independently. Quantifications of these interrelations will inform models of boundary detection in spoken language processing, and boundary generation in synthetic speech, by reflecting a richness of prosodic structure lacking in models using primarily objective duration measures. Synthetic speech has been shown to have a higher cognitive load (lower intelligibility, lower recall) than natural speech, and is especially challenging for populations such as non-native speakers, aging and/or hearing-impaired speakers, or those with language disorders. Redundant acoustic cues are known to increase speech comprehensibility in noisy conditions, and to potentially lighten the listeners' cognitive load. Increasing the strength of pitch-based cues that facilitate boundary processing may increase both the intelligibility and naturalness of synthetic speech and TTS-based assistive technologies.

View original record on NSF Award Search →