CAREER: More than Words: Advancing Prosodic Analysis

$37,741FY2014CSENSF

Cuny Queens College, Flushing NY

Investigators

Abstract

Prosody is an essential component of human speech. Whereas the words are "what is said", prosody is "how it is said". A wealth of information is communicated via prosody including information about a speaker's intent and state (speaking-style and emotion). To advance the capabilities of machines to understand human speech, this CAREER project develops new representations of prosody and applies them to a variety of spoken language processing tasks: word recognition, speaking-style recognition, dialog-act classification and speaker identification. This project employs and advances semi-supervised and unsupervised representation learning techniques to characterize prosody. This project also investigates prosody across multiple languages. Speakers of multiple languages contribute speech and annotate some basic prosodic phenomena (phrasing and prominence). The overarching goal is to identify a compact and universal representation of prosody that will be employed effectively in spoken language processing tasks across languages. Scientific results, representations and tools for extraction will be made open-source as will the collected, annotated multi-lingual data. Speech recognition is being integrated into our lives through mobile devices and spoken dialog systems. The next great hurdle in the ability to communicate with machines via speech is understanding prosody. Taking prosody into account will result in machines understanding humans better; conversely, automatically generating adequate prosody to convey intent will allow machines to sound more human. Both types of improvement are sorely needed as automated conversation agents and robots are starting to become a part of our everyday lives. Finally, this project implements an innovative and challenging education plan that is well-integrated with its research. It includes curricula modules on prosodic analysis and representation learning to be widely disseminated. Moreover, undergraduate students who provide and annotate speech samples for the project will get a hands-on introduction to computer science research, and will be compensated in part with tuition waivers for introductory courses in computer science.

View original record on NSF Award Search →