CAREER: Learning from demonstrations and beyond -- consolidating imitation and reinforcement learning

$463,572FY2023CSENSF

Texas A&M Engineering Experiment Station, College Station TX

Investigators

Abstract

Recent advancements in deep reinforcement learning (RL) hold unprecedented potential for automating and optimizing control of real-world tasks such as autonomous driving, traffic management, medical procedures, robotic manufacturing, and energy management. Unfortunately, it is common for RL algorithms to exhibit unstable and/or inefficient learning, which limits their applicability. Seeking to address this critical concern, this CAREER project leverages imitation learning (IL), or behavior copying, which is better understood and typically more stable. The project targets the unification of IL and RL into a holistic paradigm that can safely and effectively learn from, and outperform, existing solutions. This project will address outstanding knowledge gaps in both types of learning through a novel curriculum decomposition of the tasks, where simplified demonstrations are used to bootstrap the learner’s behavior. The project will also foster education and outreach activities. Specifically, it will enhance undergraduate STEM training by providing students with exposure to scientific research and knowledge discovery processes relating to safety-critical AI applications through an original multidisciplinary undergraduate engineering program. Moreover, it will facilitate a unique K12 outreach activity within a large community in Bryan, TX. The project will support and advance an existing research collaboration with an industrial partner in the context of defense technology. This collaboration, in turn, is expected to advance the US national defense. This project will form the basis for a new research thrust in ML---one that combines IL and RL toward a holistic, robust, and safe learning framework. It will define and prove a no-regret bound on the training process within the Markov-Decision Process formalization. The approach is to reduce an IL problem to an RL one that includes a domain-independent curriculum-learning trajectory. The resulting algorithms and solutions are expected to achieve state-of-the-art performance in complex control domains as well as to deepen theoretical understanding of the potential and limitations of the resulting solutions. Specifically, the research seeks to prove conditions guaranteeing policy convergence and monotonic improvement during training. Moreover, the project will develop domain-specific adaptation to and analysis of real-world applications (autonomous driving and robotics testbeds) while providing stable and efficient RL from demonstrations. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →