Leveraging Natural Language Processing for Reverberant Speech Enhancement in Cochlear Implants

$173,782R56FY2023DCNIH

Duke University, Durham NC

Investigators

Leslie M. Collinscontact John H.l. Hansen

Abstract

ABSTRACT The overarching goal of this project is to develop algorithms to address the difficulties that cochlear implant (CI) users experience interpreting speech in reverberant listening environments like churches, auditoriums and classrooms. Recent research has made progress in this area using time-frequency masking techniques, but these algorithms are often not robust in changing acoustic environments or are not amenable to real time processing. Machine learning (ML) and artificial intelligence (AI) techniques are burgeoning in many applications areas recently, but to date, AI/ML approaches for reverberation in CI users have shown limited success. Our proposed approach is to investigate several AI/ML speech enhancement methods based on the natural language processing (NLP) field to essentially recognize speech in reverberation and then clean it. We will provide final assessment of algorithm performance by using the open-source NIH-supported CCi-MOBILE CI research platform for its ease and flexibility necessary for developing and prototyping CI signal processing algorithms. We propose to use phoneme-based recognition and automatic speech recognition (ASR) approaches to develop and test our reverberation mitigation algorithms. Aim 1 will investigate the real-time feasibility of exploiting phoneme recognition for ML-based T-F masking in CIs. We will develop a novel phoneme-based T-F mask estimation algorithm and conduct speech recognition tests with an offline algorithm mode to compare conventional and phoneme-based T-F masking. This work will determine whether phoneme knowledge is beneficial for speech enhancement in CIs. Aim 2 will investigate the utility of real-time T-F mask estimation in CI users. We will implement various T-F mask estimation algorithms to mitigate reverberation from the literature (including our novel phoneme-based T-F algorithm developed in Aim 1) in real-time in CCi- MOBILE. In addition to their impact on speech intelligibility, algorithms will be benchmarked against CI computational limits and tolerable time delays of audiovisual asynchrony. This work will evaluate the effectiveness of T-F mask estimation algorithms in real-time operational conditions. Aim 3 will investigate advancing speech intelligibility for CI users via ASR and text-to-speech synthesis (ASR-TTS). We will investigate various front-end speech enhancement strategies to improve ASR predictions and TTS engines with generic and familiar synthetic voices. This work will use CCi-MOBILE to evaluate the utility of ASR-TTS and the effect of speaker familiarity on reverberant speech intelligibility in CI users. Our team brings AI/ML, hardware, experimental testing and audiology experience that will be needed for successful research. CCi- CLOUD, a cloud feature of CCI-MOBILE, will be used to facilitate remote and collaborative CI user studies. Our work is highly innovative and has the potential to instigate a paradigm shift towards AI/ML-driven auditory protheses that leverage NLP to adapt speech processing strategies to acoustic settings to maximize user benefits. Demonstrated success will improve the quality of life of CI users.

View original record on NIH RePORTER →