EAGER: Exploring the Role of Acoustic-Prosodic, Lexical, and Demographic Factors in Trustworthy Speech Perception for Conversational Agents

$98,140FY2023CSENSF

Cuny Hunter College, New York NY

Investigators

Abstract

This EArly Grant for Exploratory Research explores the role of acoustic-prosodic, lexical, and demographic factors in the perception of trustworthy synthesized speech for conversational agents. With advances in machine learning and speech technologies, conversational agents are becoming increasingly capable of engaging in human-like conversations. However, trust is crucial for effective communication and collaboration, and understanding the signals of trustworthy speech is essential for successful interactions. While researchers across disciplines have sought to discover the signals of trustworthy speech, mostly in human speech, there remains a gap between what is currently understood about trustworthy human speech and what can be implemented and used in conversational agents. This project will implement a series of innovative and exploratory perception studies designed to systematically investigate the prosodic, lexical, and demographic properties of trustworthy synthesized speech. To evaluate trust perception in contexts that require vulnerability and trust, real-world applications such as emotional support dialogues will be used. By uncovering the specific influences of acoustic-prosodic, lexical, and demographic factors, this research will advance our understanding of how trust is formed and maintained in human-machine interactions. The findings of this work will contribute valuable insights to improve the perceived trustworthiness of conversational agents. This, in turn, will enable the increased adoption of transformative technologies that will benefit society in important application areas, including assistive robot companions in homecare settings for the elderly and homebound, psychological assessment and treatment, and assistive medical care in hospitals. The main objective of this research is to identify acoustic-prosodic, lexical, and demographic factors in trust perception of synthetic speech. The project will systematically test the effects of these factors of synthesized speech on human trust using a large-scale crowdsourced perception study. Highly controlled parameters will be manipulated to test the effects of acoustic-prosodic features including pitch, intensity, and speaking rate, as well as lexical features such as dialogue act, politeness, and complexity. In addition, the study will examine individual differences in trust perception across speaker and listener traits. By exploring individual factors as well as interactive effects of combinations of prosodic, lexical and demographic factors, this research will provide a comprehensive understanding of their influence on user trust. The findings will inform the design, development, and deployment of conversational agents, leading to the creation of more trustworthy and engaging human-machine interactions. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →