Collaborative Research: RI:Medium:MoDL:Mathematical and Conceptual Understanding of Large Language Models
Stanford University, Stanford CA
Investigators
Abstract
Large language models (LLMs) have achieved unprecedented success in natural language processing (NLP). Since language models are being seen as a cornerstone of artificial intelligence in the near future, there is a need to be able to understand them, and to convey that understanding to regulators as well as the general public. These models are based on deep neural networks that are trained from vast quantities of text and have been demonstrated to be highly useful in performing tasks such as question answering, text classification, machine translation and summarization. Despite the huge empirical success, there is little understanding about their inner workings. This project seeks to bridge the gap by developing conceptual and mathematical understanding about training and using LLMs. The project will advance such understanding. The project will also seek to develop and disseminate instructional materials and draw on ideas from the project to impact ongoing programs at their institution to help increase participation in computing by individuals from underrepresented groups. The project has three components. (1) We will first build simplified generative models that capture the intrinsic structures of text, and analyze language models that are trained on texts from such generative models. (2) We then analyze why the learned language models can encode useful information that helps a wide range of downstream tasks. (3) Finally, we analyze and design new adaptation methods for downstream tasks with quantitative sample and computational efficiency guarantees. Education and outreach plans are integrated into this project: the investigators will develop a new introductory course in machine learning and disseminate instructional materials, mentor graduate and undergraduate students from underrepresented groups (through Princeton Freshman Scholars Institute, Stanford Summer Teacher Research Program, REU’s) and organize research workshops to promote conversations between the theoretical machine learning and NLP community. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →