Multi-Channel Information Processing for Intelligent Collaboration
Georgia Tech Research Corporation, Atlanta GA
Investigators
Abstract
In collaborative sessions, information presented or generated is usually unstructured so that, for example, a traditional tape recording of a meeting does not present a clear picture of who said what and when, thereby limiting the usability of such information in contexts other than a simple replay. Furthermore, in face-to-face interactions participants make full use of human sensory capabilities such as binaural hearing and binocular (stereo) vision to conduct information sharing and exchange. To enable intelligent reuse of collaborative session information at a future time, or to support real time collaborations involving parties located remotely, we need to be able to acquire, process, integrate and organize speech, audio and multimedia information so as to provide a sense of full-dimensional immersive interaction. Among other things, this implies that individual sources must be accessible for life-like playback free of interference from other information streams. In this project the PIs will develop technologies to these end, focusing in particular on achieving spatialized audio with little or no interference due to reverberation or noise, individual tracking of information sources, and rich and accurate annotation that supports efficient storage and retrieval of the information in a session. The PIs plan to meet these challenges through use of multi-channel acoustic signal processing, which involves multi-channel signal acquisition, processing, representation and synthesis. Their approach to multi-channel signal acquisition will exploit the ubiquity of individual access (and transduction) devices such as the cell phones and PDAs that most people carry nowadays. During a meeting, these devices can be fitted with necessary interfaces to form a distributed information acquisition network such that the signal generated in the meeting can be captured and transmitted to a wireless base stations for processing. The project will result in the following key technical innovations: a fundamental formulation of acoustic signal processing as a multi-input-multi-output (MIMO) problem; a practical solution to "sound objectization" that takes ubiquitous acoustic signals as input and produces as output an array of signals with known source identities; effective organization of the multi-channel acoustic signals in a collaborative session for efficient retrieval; and modeling and mitigation of non-linearity due to speech and audio coding in distributed acoustic signal processing. Broader Impacts: This research will lead to a modern multi-channel signal processing theory for acoustic signal processing, along with a new generation of technology for remote collaboration and conferencing that allows participants to conduct full-dimensional information sharing, as well as a new generation of technology for collaborative session recording and playback that revolutionize the meeting room facility for efficient retrieval of precise information. Because collaborative applications of all kinds lie at the heart of the emerging information society, this work will ultimately lead to substantial enhancements in productivity.
View original record on NSF Award Search →