CAREER: Learning Structured Models with Natural Language Supervision

$472,331FY2023CSENSF

Massachusetts Institute Of Technology, Cambridge MA

Investigators

Abstract

Current machine learning models struggle to understand visual scenes, perform household chores, and complete other tasks that require integrating low-level perception and action with high-level common-sense and background knowledge. This CAREER project will use language to bridge this gap by developing techniques that use language-based dataset annotations and large text corpora to guide training of machine learning models for robotics, computer vision, and other problem domains. New approaches for learning with natural language supervision will reduce the amount of data needed to train machine learning models and enable end users to shape model behavior without complex formal specifications. The project will provide research training to undergraduate and graduate students, and will be integrated into a new workshop series that connects academic language processing researchers and researchers in other application areas (with a focus on providing learning and community-building opportunities for students from historically marginalized groups). The educational component of the project will develop new curriculum materials on natural language processing and human factors in artificial intelligence systems, targeting high school and undergraduate students as well as non-technical industry groups (like journalists and policy researchers) studying the effects of automated decision-making systems. The technical core of this project is a new family of probabilistic latent variable models in which latent representations of plans or percepts jointly generate task data and natural language annotations. When language annotations are available, they can directly supervise the content of these latent representations; on unannotated examples, information from text corpora may be used to constrain latent representations' distribution. Language thus plays two roles: as a source of information about the structure of individual training examples and a source of general, task-level background knowledge. Research will yield concrete instantiations of this modeling framework for policy learning, language modeling, and scene understanding, using language to produce structured, composable models that combine the flexibility of the deep learning toolkit with the sample efficiency and controllability of symbolic representations, while requiring neither massive labeled datasets nor precisely formalized symbolic domains. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →