A Shared Database for the Study of Phonological Development

$223,419R01FY2010HDNIH

Carnegie-Mellon University, Pittsburgh PA

Investigators

Linked publications & trials

Abstract

DESCRIPTION: The study of phonological development has important implications for the diagnosis and treatment of language disorders, models of the biological bases of language production, the teaching of second languages, and the general advancement of linguistic theory. Recents advances in computational power make it possible for researchers to link high quality digital recordings to phonological and phonetic transcriptions. Using standards such as Unicode, IPA, and XML, the CHILDES database project (http://childes.psy.cmu.edu) now provides universal access to large corpora of transcripts linked to audio for students of both first and second language acquisition, along with a wide array of tools for lexical, syntactic, and discourse analysis. However, the CHILDES Project has not yet built effective tools for phonological and phonetic analysis. We will close this gap by developing a new Java-based program called Phon that interfaces with the CHILDES transcription format. Phon provides: (1) easy user-controlled utterance boundary marking, (2) an input method for Unicode IPA transcription of child forms, (3) automatic alignment of segments in child forms to waveform regions, (4) automatic insertion of the IPA form for adult target words, (5) automatic alignment of child forms to the adult targets for both segmental and prosodic levels, (6) tools for querying the database, and (7) tools for composing output reports. Phon will be configured to run either locally or over the web as a Java WebStart application. The construction of the new database will be supported by a group of 26 researchers who have agreed to contribute already collected and transcribed corpora from children learning 17 different languages. Subjects include bilingual children, normally-developing monolinguals, and children with language disorders. The data will be structured to facilitate testing of models regarding babbling universals, variant paths in segmental and prosodic development, markedness effects, prosodic context effects, segmentation patterns, statistical learning, frequency effects, interlanguage transfer, diagnosis of disability, stuttering patterns, disfluency patterns, and the effects of morphology and syntax. Benchmarks will be established to emphasize the direct competitive teasting of competing hypotheses from alternative theoretical and methodological positions.

View original record on NIH RePORTER →