IIS/HCI: Modeling Idiosyncracies in Speaking Behavior

$1,535,065FY2003CSENSF

International Computer Science Institute, Berkeley CA

Investigators

Nikki Mirghaforicontact Elizabeth E Shriberg

Abstract

Speaker-characteristic information is encoded in multiple levels in the speech signal, including low-level acoustic features of the speech spectrum, prosodic patterns, distinctive pronunciation and word usage, and even an unusual laugh or other idiosyncratic vocal gestures. Yet most speaker recognition systems today rely exclusively on the lowest-level features, breaking the speech stream into a series of 10-20 msec frames modeled essentially as independent events. This project explores higher-level sources of speaker-distinctive information via two feature discovery tracks: one building on existing linguistic constructs and guided by insights from psycholinguistics and human performance studies; the other a purely data-driven approach, seeking idiosyncratic "vocal performances" -- spectro-temporal patterns with high speaker-characterizing power, independent of linguistic constraints. Supplementing this core feature discovery program, project efforts include work in feature selection and combination, work to encode these information sources into more effective speaker models, and evaluation of these models primarily for speaker recognition, but also in other speech technology applications. On the theoretical side, this work should lead to a better understanding of what makes a speaker's voice and speech behaviors unique, what constitutes normal within-speaker variation, and what dimensions are important for modeling speaking style. Further, the resulting speaker models should have numerous practical applications: improving speaker recognition systems, providing more speaker-focused models for automatic transcription, facilitating analysis of speaker behavior, and supporting more personalized human-computer interaction. In addition, this work will make resources available to the broader research community through the dissemination of feature datasets and through educational activities.

View original record on NSF Award Search →