GGrantIndex
← Search

Doctoral Dissertation Research: Investigating cognitive and communicative pressures on natural language lexicons

$11,984FY2015SBENSF

Massachusetts Institute Of Technology, Cambridge MA

Investigators

Abstract

Understanding how humans produce and comprehend language is a critical step in understanding high-level human cognition and the human brain more generally. Moreover, basic research into human language has been, and will continue to be, useful for building computational natural language processing systems that enable humans to interact naturally with computers. The lexicons of the world's thousands of languages--that is, the sets of words that exist in any given language--offer a particularly rich source of insight into the language production and comprehension mechanism. The words of any given language have undergone thousands of years of evolution, sometimes changing dramatically over one or two generations as sounds change, new words are invented or borrowed from other languages, and old words die. What all languages have in common, however, is that they enable their speakers to successfully communicate with one another. Therefore, a language's lexicon is necessarily constrained by the cognitive and communicative demands of speakers. Consequently, studying the statistical properties of lexicons, the ways that lexicons evolve, and the process by which words are formed is a promising avenue for answering fundamental questions about human cognition. Building on previous work by this research group showing that lexicons tend to be structured for efficient communication, this research will harness the power of large cross-linguistic data sets available through the Internet, including Wikipedia and Google Books, in order to study the lexicons of a large number of world languages (~100). Specifically, this analysis will focus on how words cluster or spread out in phonetic space, exploring competing demands for words to consist of easy-to-pronounce and easy-to-comprehend sequences but also to be phonetically distinct from one another. A second major component of this work is a series of human-participant behavioral experiments that, in a controlled laboratory setting and in a smaller number of languages, explore the mechanisms that underlie how words change over time. Finally, a computational model will be used to integrate the insights of the statistical analyses and behavioral experiments in order to explore and predict how words enter and exit the lexicon over time. This research program has implications not just for higher-level human cognition but for any engineering applications that require human-computer interaction involving natural language and also for any applications that require building a cognitively tractable communication system that allows people to communicate efficiently.

View original record on NSF Award Search →