CAREER: The Listening Machine - Sound Source Organization for Multimedia Understanding

$499,897FY2003CSENSF

Columbia University, New York NY

Investigators

Abstract

This is a 5-year continuing award. The goal of the Listening Machine project is to develop techniques that can analyze everyday sonic environments into objects and events similar to those perceived by human listeners. Rather than viewing an audio file as an opaque, undifferentiated dataset, we want to be able to treat it as a combination of sound-source objects with more or less specific characteristics - for instance, the voice of a particular person plus background hum plus door closing followed by footsteps, etc. The kinds of recognition analyses to be developed by the PI will make possible rapid browsing of recordings, by segmenting and summarizing them in terms of the sounds of the objects contained - discussion with multiple people, ride inside a vehicle, outside on the street etc. Analyzing such recordings in terms of separate objects is particularly challenging, because the sounds almost always overlap and interfere. It is not adequate simply to train a recognizer on the sound of, say, a telephone ringing, because that sound changes completely when it is heard against a background of music on the radio. Instead, the PI will adapt and develop emerging techniques from speech recognition that classify signals based on partial observations, simultaneously recognizing subsets of the time-frequency information and inferring a segmentation into regions dominated by different sound sources, to maximize the probability that the explanation matches the observations. The techniques to be devised will support a new range of applications in which machines can stand in for human listeners, including new ways to search online multimedia content, interactive robots with a human-like awareness of their environments, and prosthetic devices for the hearing impaired. The educational contributions of the project will center around a Laboratory for Recognition and Organization of Speech and Audio (LabROSA), which the PI will establish with a unique focus on intelligent analysis of general audio. In addition to supporting students working within the group, research results will feed into course- and project-based educational activities in the form of new topic content and new tools and demonstrations. The immediacy of audio illustrations developed as part of this work will make these ideal demos for Columbia University's outreach to local high school students through periodic Engineering Open Houses.

View original record on NSF Award Search →