Collaborative Research: CIF:Medium:Theoretical Foundations of Compositional Learning in Transformer Models
University Of Wisconsin-Madison, Madison WI
Investigators
Abstract
Large Language Models (LLMs) based on transformer architectures, such as GPT-4, Llama 2, and Claude 3, have demonstrated remarkable emergent capabilities in compositional reasoning, allowing them to tackle complex tasks by decomposing them into simpler intermediate steps. Examples to these tasks include text and code generation, basic arithmetic and problem solving, and answering complex questions. Despite these empirical advances, the underlying mechanics of these capabilities remain largely unexplored. This collaborative research project aims to investigate the theoretical foundations of compositional learning in transformer models, focusing on three key areas: model expressivity, statistical learning theory, and optimization, aiming to develop novel learning guarantees, algorithms, architectures, and design principles that significantly advance the development of more capable and interpretable Artificial Intelligence (AI) and LLM systems. The research findings will be incorporated into educational curricula, fostering a diverse community around transformers, compositional learning, and their applications. The project will also engage the broader public through workshops and outreach activities, promoting responsible AI practices and AI education for undergraduate and K-12 students. The first thrust will explore the expressive capacity of transformers augmented with loops, memory, and external tools, which are essential for compositional reasoning. The second thrust will examine the statistical properties of autoregressive training using compositional data to understand its limits, benefits, and ability to generalize to novel problem instances. This is expected to lead to new theories of compositional learning that will highlight the role of skill acquisition and composition. The third thrust will investigate the optimization principles of compositional learning with transformers. This research will shed light on the optimization landscape and identify techniques for more efficient training of transformers through compositional techniques. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →