Audiovisual Distinctive-Feature-Based Recognition of Dysarthric Speech

$668,575FY2005CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Mark A Hasegawa-Johnsoncontact Adrienne L Perlman Jon R Gunderson Thomas S Huang

Abstract

Automatic dictation software with reasonably high word recognition accuracy is now widely available to the general public. Many people with gross motor impairments, including some people with cerebral palsy and closed head injuries, have not enjoyed the benefit of these advances, however, because their general motor impairment includes a component of dysarthria, that is to say reduced speech intelligibility caused by neuro-motor impairment, while the motor impairment often precludes normal use of a keyboard. For this reason, dysarthric users often now find it easier to use a small-vocabulary automatic speech recognition system, with code words representing letters and formatting commands, and with acoustic speech recognition models carefully adapted to the speech of the individual user. But development of such individualized speech recognition systems remains extremely labor-intensive, because so little is understood about the general characteristics of dysarthric speech. In this project, the PI will study the general audio and visual characteristics of articulation errors in dysarthric speech, and apply the results to the development of speaker-independent large-vocabulary and small-vocabulary audio and audiovisual dysarthric speech recognition systems. More specifically, the PI will research word-based, phone-based, and phonologic-feature-based audio and audiovisual speech recognition models for both small-vocabulary and large-vocabulary speech recognizers designed for unrestricted text entry on a personal computer. The models will be based on audio and video analysis of phonetically balanced speech samples from a group of speakers with dysarthria, categorized into the following four groups: very low intelligibility (0-25% intelligibility, as rated by human listeners), low intelligibility (25-50%), moderate intelligibility (50-75%), and high intelligibility (75-100%). Interactive phonetic analysis will seek to describe the talker-dependent characteristics of articulation error in dysarthria; based on analysis of preliminary data, the PI hypothesizes that manner of articulation errors, place of articulation errors, and voicing errors are approximately independent events. Preliminary experiments also suggest that different dysarthric users will require dramatically different speech recognition architectures, because the symptoms of dysarthria vary so much from subject to subject, so the PI will develop and test at least three categories of audio-only and audiovisual speech recognition algorithms for dysarthric users: phone-based and whole-word recognizers using hidden Markov models (HMMs), phonologic-feature-based and whole-word recognizers using support vector machines (SVMs), and hybrid SVM-HMM recognizers. The models will be evaluated to determine overall recognition accuracy of each algorithm, changes in accuracy due to learning, group differences in accuracy due to severity of dysarthria, and dependence of accuracy on vocabulary size. Broader Impacts: This research will lay the foundation for constructing a speech recognition tool for practical use by computer users with neuro-motor disabilities. Tools and data developed in this project will all be released open-source, and will be designed so they can be easily ported to an open-source audiovisual speech recognition system for dysarthric users. The work may also have applicability beyond the target community, in that project outcomes may be relevant to many other populations (e.g., people with foreign accents) who have trouble training current ASR systems.

View original record on NSF Award Search →