EAGER: Investigating the Role of Discourse Context in Speech-Driven Facial Animations
University Of Texas At Dallas, Richardson TX
Investigators
Abstract
This EArly-concept Grants for Exploratory Research analyzes the role of discourse and dialog context in the generation of believable, human-like behaviors for conversational agent (CA), i.e., a virtual agent that interacts with a user. CAs aim to engage the users by displaying human-like behaviors not only through speech by also through facial gestures. One useful modality to drive facial behaviors is speech. Spoken language carries important information beyond the verbal message that a CA engine should capitalize on. A challenge in speech-driven animation is to generate behaviors that respond to the discourse context. This proposal presents a top-down approach to explore the importance of considering contextual information in the modeling of speech-driven facial gestures. The project starts with speech-driven models, based on dynamic Bayesian networks, which do not capture the specific discourse context, responding only to the properties of the acoustic features. Then, the study considers discourse-specific models in which the intent of the gestures is known. The study defines a specific, controlled domain as testbed, recording multiple human interactions. Similar speech-driven models are trained constrained by the specific discourse function. The study evaluates the differences in the perceived naturalness, appropriateness and rapport of generated facial gestures. The study explores which discourse aspects affect the facial animation models, and which are more domain specific or independent. By incorporating the intrinsic discourse information, the proposed models generate behaviors that respond to conversational functions, addressing one of the limitations in speech-driven facial animations. The findings have a longterm impact in variety of health care applications, such as helping hearing impaired individuals and teaching social skills to autistic children. Likewise, discourse-dependent speech-driven models can play a key role in better tutoring systems that display human-like behaviors to communicate and engage with the students.
View original record on NSF Award Search →