CAREER: Theoretical Foundations of Modern Machine Learning Paradigms: Generative and Out-of-Distribution

$419,335FY2023CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

The performance of large-scale machine learning models in the last several years has improved dramatically. Models like DALL-E, trained on vast amounts of text and image data, can generate high-quality images given just a verbal description ("prompt") of the image's content, even for prompts that are very far from what the model was trained on. Models like ChatGPT, trained on vast amounts of text data and interactions with humans, are capable of convincingly behaving like a human interlocutor, even solving simple mathematics exam questions. Though these systems represent an impressive feat of engineering, our scientific understanding of what ingredients in the recipe to train them are important is severely lacking. Thus, improving them often involves extensive trial and error, which also translates to considerable amounts of human hours and computational resources. Our understanding of their failure modes (or often even how to evaluate them!) is even poorer. Thus, deploying them in any safety-critical scenario is, at present, unlikely. The goal of this project is to build scientific and mathematical foundations for understanding failure modes of modern machine learning models, especially in the presence of changes between the data they are trained on and deployed on. This project will outline the challenges in building mathematical and scientific footing for understanding and improving modern generative models, as well as formalize settings in which distributions substantially different from the training distribution can be handled. The investigator will establish useful formalizations of learning paradigms, structural assumptions that make them tractable, and develop new algorithmic tools for such settings. The project will have two major prongs: (1) Algorithmic tools for sampling, inference, and learning in the context of probabilistic generative models, as well as developing analytical machinery for understanding failure modes of different families of models and algorithmic solutions to circumvent and ameliorate them. (2) Learning settings involving shifts in the data distribution, including domain generalization (when data from several data environments is presented, and the learner is expected to perform well in a new environment), domain translation (when a model is presented data from two domains and is expected to learn how to "translate" between them) and continual learning (when data from different environments is presented in an online fashion, and the learner is expected to retain good performance in all environments). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →