CAREER: Co-analysis of Signal and Sense for Understanding Non-verbal Communications and their Applications
University Of Memphis, Memphis TN
Investigators
Abstract
The objective of this research is to advance our understanding of prosodic relationships and their synchronizations between verbal and nonverbal communication modes. Prosody and kinesics such as hand gestures, head nods, facial expressions, and posture, play a crucial role in everyday communication by adding expressiveness, as well as by structuring information. Although there are different types and various levels of synchronization across these modalities, their exact mapping remains unclear. Whereas cross-modal synchronization between verbal and nonverbal modalities has been explored mostly at the semantic and discourse levels to date, in this work the PI will focus on the interplay between them at various levels of granularity. In particular, co-analyses of speech, using language and discourse models, with kinesics will be used to uncover prosodic correspondences which, in turn, will be used to develop novel algorithms for modeling dialog acts, emotions and other kinesics. The two primary goals of the project are to develop computational methods and software tools to iteratively uncover prosodic relationships between nonverbal and verbal behaviors, and to use derived prosodic relationships and their synchronizations to develop novel computational methods and software tools for the robust recognition of gestures, facial expressions, emotions, head nods, and dialog acts. The PI hopes to thereby lay the foundation for a framework for co-analysis of multimodal articulations to obtain a deeper understanding of (a) how the nucleus of an utterance and visual prosody interact to render the intent of the utterance, and (b) how synchronization with other modalities affects the production of multimodal co-articulation. To further improve the robustness and recognition accuracies, a set of classifiers will be designed and fused by taking into account the diversity among them. Systematic methods will be developed to evaluate the classifiers using various performance metrics (i.e., precision, recall, F-measures), graphical analyses and measure functions. The outcomes of the research, together with the PI's prior work, will ultimately enable the development of a perceptual interface for AutoTutor (an artificially intelligent web-based tutoring system), providing a natural means to interact with multimedia contents for instruction. Broader Impact: The results of this research will have profound impact on the understanding and tracking of multimodal communications in humans and agents. The interplay between the complementary modalities and prosodic manifestations of their synchronization will also broaden the understanding of multi-channel communications in cognitive science, discourse processing, linguistics, and human-machine interaction, which will enable the development of innovative applications such as collaborative environments for agents and humans, and assistive technologies for the elderly and disabled. The long-term vision of the proposed research is to develop a perceptual interface for web-based tutoring systems such as AutoTutor. Use of an enhanced artificially intelligent web-based tutor offers significant opportunities for improving the math and science preparation of incoming engineering and science undergraduates of the Memphis City Schools and other regional or national clients. The PI will also create an online collaborative learning environment, using newer frameworks such as Web 2.0, to organize a massive amount of digital contents in such a way that communities of learners can effectively share and co-manage the information. The software and databases developed as part of this project will be made available to other researchers through the project website.
View original record on NSF Award Search →