CAREER: RI: Structural Linguistic Generalization Through Expert-Designed Tasks

$468,808FY2023CSENSF

New York University, New York NY

Investigators

Abstract

As speakers of a language, we can understand complex sentences that we have never read before. We do so through a process called structural generalization: we divide the sentence into smaller parts, interpret the meaning of each part, and then put these meanings together to interpret the full sentence. Current language technologies, such as the virtual assistants installed on many mobile phones, are not always able to generate meaning in this way. This leads them to make mistakes when they are expected to understand more complex commands; these mistakes may be more common in languages that are spoken by smaller communities. The researchers involved in this project will create tests in multiple languages in order to measure a computational system’s structural generalization capabilities, and will use those tests to teach the system to generalize correctly. This will result in systems that work more reliably and across a wider range of languages. Alongside its technological contribution, this project will help train a new generation of scientists that are well-versed in both artificial intelligence and the structure and diversity of human languages. The project will also draw new populations to the field through outreach events that will introduce students interested in language to artificial intelligence. From a technical standpoint, the researchers will create both expert-designed and real-world semantic parsing tasks – mappings from natural language utterances to a formal language that captures their meaning – which will span a diverse sample of languages. In the expert-designed tasks, natural language inputs will be generated from a grammar created by a linguist, and the outputs will be lambda calculus formulas computed from the input. The distribution of linguistic structures will differ systematically between the training and test set of each task (“structural splits”). The structural splits of the expert-designed task will be used to endow artificial neural networks, such as large language models, with an inductive bias that favors structural generalization; such an inductive bias will then lead to structural generalization on real-world tasks. This will be accomplished using a meta-learning approach, whose objective is to learn from one part of the split and generalize to the other. These behavioral experiments will be accompanied by an investigation of how the network’s internal properties, such as initial weights, training dynamics and loss landscape, give rise to structural generalization. Overall, by providing tools for measuring and promoting structural generalization across languages, this project will lead to more robust and sample-efficient language understanding systems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →