ITR: Prosody Generation for Child Oriented Speech Synthesis

$2,750,000FY2002CSENSF

Oregon Health & Science University, Portland OR

Investigators

Jan P Van Santencontact Alan Black John-Paul Hosom Richard Sproat

Abstract

This project focuses on innovative algorithms for generating highly expressive synthetic speech. Current text-to-speech synthesis (TTS) systems generate speech that lacks expressiveness. This is a serious obstacle for the potential application of TTS to computer based language and speech remediation for children. Using TTS has these advantages over recorded speech, which is currently the standard in remedial systems: (i) TTS provides complete flexibility in textual materials, and enables interactivity and individualization, which are both key for successful language teaching and remediation. (ii) TTS output can be modified more easily and along far more dimensions than recorded speech, including temporal, intonational, and spectral dimensions, so that speech output can be adjusted to a child's individual pattern of needs. Generating expressive speech involves three hard research problems. (i) Computation of abstract tags that specify, e.g., which words need emphasis, and phrasing (e.g., where to pause). (ii) Based on these tags, the system has to compute a fundamental frequency contour. (iii) Severe modification of the stored speech fragments ("acoustic units") to obtain these contours. The central goal of the project is to address these research problems, and create a TTS system that will make the next generation of TTS based remedial systems viable.

View original record on NSF Award Search →