SGER: Exploring New Auditory Perception Based Approaches to ASR
Georgia Tech Research Corporation, Atlanta GA
Investigators
Abstract
In the last two decades, automatic speech recognition (ASR) has been addressed using a pattern matching paradigm with a top-down characterization of acoustic, pronunciation, and language models as an integrated finite state network. Although there have been significant advances over the years, the progress has begun to slow recently; however, the current state-of-the-state has yet to rival human speech recognition capability. This is mainly due to: (1) the inability to obtain a complete specification of all the knowledge sources needed to solve the ASR problem in a top-down manner, and (2) the ASR robustness problem. This project is developing and evaluating an auditory perception approach to ASR that is both knowledge-based and data-driven. By decomposing the ASR problem into detection of acoustic and phonetic landmarks in the speech signal followed by a sequence of spectral and temporal knowledge integration stages, the proposed approach mimics the human auditory perception process while retaining many of the key features in stochastic modeling of speech and language that have contributed to the success of the currently prevailing pattern matching approaches. The project is investigating the feasibility of feature detection and knowledge integration algorithms that are foundational to the approach and is creating a plug-and-play platform to facilitate cooperation between broad speech science and speech processing communities. This research should facilitate a better understanding of the link between auditory perception and ASR, provide educational opportunities to students and researchers to better understand speech fundamentals, challenge the signal processing community to develop new speech feature detection algorithms, and pave the way for a common software and evaluation platform to facilitate collaborative ASR research.
View original record on NSF Award Search →