GGrantIndex
← Search

Inductive learning of nonlocal phonological interactions

$214,541FY2017SBENSF

New York University, New York NY

Investigators

Abstract

Language is a fundamental and universal aspect of human cognition. Linguistic research over the past five decades has established that language structure is governed by detailed rules. These rules constrain meanings, sentences, words, and sound patterns--the focus of the proposed research. For many years, sound structure was investigated primarily by theorizing. More recently, linguists have begun to test these theories using experiments and computational models. Computational models are valuable because they can only be created based on a complete and explicit understanding of the underlying rules. If the model succeeds in learning human-like rules when given the same data that is available to human learners, then it can shed some light on the rules that constitute the human knowledge of language and how humans learn these rules. In addition to helping scientists understand the human mind, computational models and the datasets they use are invaluable in developing applied computational tools for machine language translation, language identification, and artificial intelligence. The rules that govern sound patterns differ in nuanced ways between languages, and they can be divided into two kinds. First, all languages have rules that restrict how sounds interact with sounds that immediately precede or follow them: for example, in English, words can begin in "pr" but not "pn", whereas in Greek, words can begin in either sequence. But some languages also have rules that restrict the interactions of sounds that are not adjacent (nonlocal). Languages such as Hungarian and Turkish have vowel harmony, which means that all the vowels in a word tend to share certain features of their pronunciation. Navajo (Southwestern United States) has consonant harmony--consonants have to match in certain features. In languages such as Quechua (spoken in South America) and Amharic (Africa), certain features of consonants have to mismatch. Linguists have known about these patterns for a long time, and there are many theories of how they are cognitively represented. But these nonlocal rules continue to stymie computational models, because in order to notice them, the computer has to consider many more possibilities than it would for rules on adjacent sounds. This is similar to how much more difficult it is for a computer to crack a password the longer it gets. The proposed research builds a computational model of nonlocal rules that identifies certain clues to their existence in a language. The project will compile corpora to test the model's ability to find nonlocal rules (Quechua, Shona, Hungarian, Russian, Aymara, Sundanese). The model's performance will be compared with experiments with native speakers of several languages. The model, the corpora, and the experimental data will be made freely available to the scientific community and the public. Workshops will disseminate the research in Bolivia. The project will provide training for students in computational analysis and corpus building.

View original record on NSF Award Search →