Deriving Hearing Knowledge from Speech Data

$79,854FY2017CSENSF

Johns Hopkins University, Baltimore MD

Investigators

Abstract

This EArly Grant for Exploratory Research investigates the hypothesis that speech evolved to exploit human hearing and, therefore, properties of human hearing are imprinted on speech. A support for this hypothesis is sought by optimizing the speech processing on large amounts of speech data for discrimination among speech sounds. The project intends to show that relevant hearing properties, which are consistent with the hypothesis, will emerge in optimized engineering modules. The focus is on modeling higher(cortical) levels of auditory processing, not usually studied in engineering programs. The new created knowledge should be applicable in machine recognition of noisy speech. Linguistic messages carried in speech are coded redundantly in time and in frequency. Redundancies, which are introduced in frequency by synchronous tract movements and in time by the tract inertia, are exploited by human cognition in extracting reliable information-carrying elements from noisy speech. In particular, two particular properties of human hearing are employed: 1) the ability to separate elements of speech signal into different frequency channels, and 2) the ability to extract information about temporal dynamics of signals in these channels. In particular, a deep neural net would take an output of auditory-like spectral analysis and would be trained on the data to process this auditory-like spectrum through a bank of learnable two-dimensional cortical-like spectro-temporal filters. Existence of such architecture is supported by current literature on mammalian auditory cortex. Therefore, the progress would be gauged by evaluating similarity of the derived 2-D filters with known properties of mammalian auditory cortical receptive fields and by their effectiveness in extracting information about underlying speech sounds that constitute speech messages.

View original record on NSF Award Search →