RI: Small: Modeling Idiosyncrasies of Speech for Automatic Spoken Language Processing

$449,999FY2016CSENSF

University Of Washington, Seattle WA

Investigators

Abstract

Spoken language encodes significant information in pitch and energy dynamics (prosody) and in disfluencies (self-edits) that human listeners use to understand a talker's meaning and the social/emotional context. Due to a lack of adequate models of these phenomena, current speech processing systems make little use of this information. This project tackles modeling limitations by focusing on unexpected speech phenomena, assuming that these events often carry the most valuable information, and by working with speech from a variety of social contexts. The work has applications that range from literacy assessment to improved human-computer interaction. Further, understanding the communicative role of different disfluencies in non-clinical speech will lead to more accurate clinical diagnoses. Educational aspects aim at broad exposure of the research methods to a diverse group of students at all academic levels through short courses, student TED talks, and work with a UW program for attracting and retaining low income students in STEM fields. The goal of this project is to develop computational models that extract information from prosodic cues and disfluencies for use in a variety of spoken language processing applications. The approach leverages multiscale context in predictors of expected acoustic dynamics of speech in order to automatically identify regions of atypical timing or exaggeration. Specifically, it uses deep neural networks with parallel text and acoustic inputs to represent local dynamics in combination with point process models to characterize global rates of atypical events. Linguistic analyses and crowd-sourced perception studies are used to determine types of anomalies that are information bearing (vs. noise that should be ignored in language processing), leading to improved speech understanding models. Experiments make use of a variety of data sources to assess adaptation strategies and ensure generalizability of findings. Evaluation of computational models is in the context of multiple downstream applications in order to broadly explore potential contributions.

View original record on NSF Award Search →