CAREER: Theoretical foundations for deep learning and large-scale AI models

$262,117FY2024MPSNSF

University Of California-Berkeley, Berkeley CA

Investigators

Abstract

Generative AI models have shown remarkable capabilities across various domains, making a transformative societal impact. However, their powerful capabilities present substantial challenges and risks due to limited theoretical foundations, especially regarding sensitive applications. The primary objective of this project is to establish a theoretical foundation for generative AI models including language models and diffusion models. The project will examine the capabilities and limitations of neural networks such as transformers and ResNets within these models, and develop techniques to interpret the algorithms implicitly implemented in these black-box systems. The theoretical investigation will leverage a diverse range of subjects including variational inference, sampling methods, high-dimensional statistics, computational complexity theory, and reinforcement learning theory. The results will provide valuable theoretical insights and promote the safe utilization of prevailing foundation models such as ChatGPT and DALLE. This project will establish a theoretical foundation to elucidate the capabilities and limitations of language models and diffusion models. The project will investigate three key learning modalities: in-context learning, generative modeling, and decision making. For in-context learning, this project will analyze which algorithms transformers can implicitly implement, develop techniques to interpret the algorithms implemented in transformers, and provide guarantees on optimization and generalization during meta-training. This project will derive conditions for neural networks to represent high-dimensional score functions for diffusion-based generative modeling. For decision-making, the project will reveal how neural networks can be meta-trained to approximate bandit and reinforcement learning algorithms and investigate approaches to employing neural networks as decision-making agents. The outcomes will guide principled design and responsible deployment of AI models across disciplines. The activities include graduate student training and new course developments. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →