CAREER: Intelligent Multi-Media Environments
Harvard University, Cambridge MA
Investigators
Abstract
The overall goal of this research with intelligent multi-media environments is to produce automated and relevant high-quality signal capture in enclosures populated with a variety of sensors (e.g. microphones, video cameras, etc.) without the active supervision or distraction of the system's human users. This involves incorporating several information modalities and processing methods. In particular, the fusion of raw acoustic and visual data provides an effective means for producing both high-quality speech and video. An active analysis of the video stream is performed to make intelligent decisions regarding appropriate camera selection, image framing, and evaluation of the proceedings' semantic content. Novel multi-channel speech processing methods are employed to enhance distant-talker speech acquired in the presence of background noise, reverberation effects, and competing signals. A number of practical scenarios stand to benefit greatly from advances in this field. These include environments with adverse noise conditions and multiple active (possibly uncooperative) talkers, such as those found in video-conferencing and satellite classroom environments, on a stock market floor, in a military operations setting, under surveillance procedures, with theater or sporting events, or in a biomedical context. This research includes three project areas addressing the intelligent multi-media problem described above, specifically: 1) acoustic and visual-based systems to track and analyze participants, 2) multi-channel, model-based approaches for distant-talker speech acquisition, and 3) hybrid continuous-discrete methods for speech enhancement using multi-modal data. These three modes of research focus upon a large subset of the intelligent multi-media problem and are inherently linked to one another. For instance, the speech acquisition and enhancement projects depend upon source location data provided from the hybrid tracker while the tracker benefits from the improved acoustic signals resulting from speech enhancement work.
View original record on NSF Award Search →