CAREER: Automatic Speech-Based Longitudinal Emotion and Mood Recognition for Mental Health Monitoring and Treatment
Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI
Investigators
Abstract
Effective treatment and monitoring for individuals with mental health disorders is an enduring societal challenge. Regular monitoring increases access to preventative treatment, but is often cost prohibitive or infeasible given high demands placed on health care providers. Yet, it is critical for individuals with Bipolar Disorder (BPD), a chronic psychiatric illness characterized by mood transitions between healthy and pathological states. Transitions into pathological states are associated with profound disruptions in personal, social, vocational functioning, and emotion regulation. This Faculty Early Career Development Program (CAREER) project investigates new approaches in speech-based mood monitoring by taking advantage of the link between speech, emotion, and mood. The approach includes processing data with short-term variation (speech), estimating mid-term variation (emotion), and then using patterns in emotion to recognize long-term variation (mood). The educational outreach includes a design challenge, created with Iridescent, a science education nonprofit, that teaches emotion recognition to underserved children and their parents in informal learning settings. The research investigates methods to model naturalistic, longitudinal speech data and associate emotion patterns with mood, addressing current challenges in speech emotion recognition and assistive technology that include: generalizability, robustness, and performance. The approaches generalize to conditions whose symptoms include atypical emotion, such as post-traumatic stress disorder, anxiety, depression, and stress. The research forwards emotion as an intermediate step to simplify the mapping between speech and mood; emotion dysregulation is a common BPD symptom. Emotion is quantified over time in terms of valence and activation to improve generalizability. Nuisance modulations are controlled to improve robustness. Together, they result in a set of low-dimensional secondary features whose variations are due to emotion. These secondary features are segmented to create a coarser temporal description of emotion. This provides a means to map between speech (a quickly varying signal) and user state (a slowly varying signal), advancing the state-of-the-art. The results provide quantitative insight into the relationship between emotion variation and user state variation, providing new directions and links between the fields of emotion recognition and assistive technology. The focus on modeling emotional data using time series techniques results in breakthroughs in the design of emotion recognition and assistive technology algorithms.
View original record on NSF Award Search →