CRII: RI: Towards Human-Level Assessment of Speech Quality and Intelligibility in Real-World Environments
Indiana University, Bloomington IN
Investigators
Abstract
Separating speech from background noise is crucial for many speech-based applications, including hearing prostheses, robotics, and multimedia communication. Many speech separation algorithms perform reasonably well when they are tested in simulated environments, but this level of performance does not always carry over to real environments that are more nuanced. For example, a common complaint of many hearing aid users is that their hearing aid is not effective in noisy environments such as restaurants. Current computational measures do not enable practical or convenient speech assessment in everyday environments, and this is a major hurdle for improving real-world separation performance. In addition, the end-user has largely been left out of the development and evaluation process, which is not ideal since an approach's usefulness is ultimately determined by people. The objective of this project is to develop computational evaluation algorithms to better assess speech quality and intelligibility in real environments. A key area of research focuses on developing novel, data-driven assessment algorithms that use deep learning to predict human assessment scores, which enables testing in real environments. Considering the recent success that deep learning has had in speech processing, this new assessment approach is promising and offers substantial differences from prior approaches. The relationship between spectral-temporal speech attributes and human assessment scores are determined as a result of this project. Quantifying this relationship ensures that assessment algorithms are accurate and have strong agreement with human evaluations. An effective integration of human assessment in speech separation algorithm development should result in improved separation algorithms, which ultimately benefits users and applications. This is expected since accurate assessment enables researchers to more easily identify and correct weaknesses based on real-world environmental factors. The research activities lay the foundation for the emerging research area of improving realism in speech processing applications and offer key insights on human perception to the larger scientific community. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →