Theoretical Foundations For Demystifying Continual Learning

$517,498FY2025ENGNSF

Ohio State University, The, Columbus OH

Investigators

Abstract

Continual learning (CL), also known as lifelong learning, seeks to enable artificial intelligence (AI) systems to learn new skills over time without forgetting what they already know—much like how humans continuously refine and integrate knowledge. However, a major challenge that needs to be overcome is “catastrophic forgetting,” where an AI system loses competence in old tasks when learning new ones. This project will advance fundamental understanding of continual learning by developing theoretical tools to explain and predict how knowledge is transferred and forgotten across tasks. Such insights will inform the design of new algorithms that better preserve and share knowledge over time. The broader significance lies in building more adaptive, reliable AI systems for a wide variety of applications ranging from robotics and personal assistants to scientific discovery. By strengthening the theoretical foundations of CL, the project will help ensure that AI systems can adapt in dynamic, real-world environments while remaining robust and trustworthy. The project will also help train the next generation of researchers, while broadening public understanding of AI through workshops, teaching, and outreach to K–12 audiences. This project aims to establish a rigorous theoretical framework for understanding forgetting, generalization, and knowledge transfer in CL. It has three main thrusts: (i) developing and optimizing a new class of sequential replay algorithms designed to outperform existing concurrent replay methods—especially for dissimilar tasks—and creating hybrid strategies that combine both approaches effectively; (ii) analyzing catastrophic forgetting and generalization performance in transformer-based models tackling time-varying classification tasks, with a focus on how evolving attention mechanisms govern knowledge transfer and retention; and (iii) investigating how transformers’ in-context learning ability behaves under shifts in input distributions, revealing how such shifts impact training dynamics and forgetting. Methodologies that will be developed will combine mathematical analysis, algorithm design, and empirical validation using synthetic data, real-world datasets, and large-scale models such as deep neural networks and large language models (LLMs). By delivering both provable theoretical insights and practical algorithmic solutions, the project will advance the state of the art in CL and enable the deployment of AI systems that can continually adapt, learn safely in changing environments, and maintain strong performance across diverse and evolving tasks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →