EAGER: Approximate Inference Robust Speech Processing via Sampling

$150,000FY2011CSENSF

Drexel University, Philadelphia PA

Investigators

Abstract

Speech enhancement and speaker recognition are, in principle, related tasks. Knowledge of the speaker can allow for better speech enhancement, while speech with less interference improves the ability to recognize the speaker. Statistical models for these two tasks can be coupled to yield a principled approach to performing both jointly, however the complexity of exact inference in the resulting statistical model, which must straddle a nonlinear feature calculation, is prohibitive. For this reason, approximate inference techniques that sacrifice some performance of exact inference for lower complexity are of interest. This Early Grant for Exploratory Research brings Gibbs sampling approximate inference techniques to bear on joint speech enhancement and speaker recognition for the sake of comparison with several variational approximate inference techniques. The ultimate aim of the research is to partition the attainable complexity and performance space into different regimes dictating which techniques should be used and the performance attainable as a function of complexity. To obtain this partitioning, a thorough empirical evaluation on several large speech corpora will be carried out. Speech enhancement and speaker recognition technology finds multiple uses in defense, commercial, and medical technologies. Multiple technologies would benefit from performance improvement in speech enhancement and speaker recognition brought about by successfully integrating these two interdependent tasks. Potential applications include dialogue systems for controlling e.g. television sets using distant microphones where additive noise is a significant source of distortion. Additionally, speech enhancement is useful for hearing aids. Speaker dependent prior information gains the ability to improve the intelligibility of speech in these devises.

View original record on NSF Award Search →