CAREER: Theoretical Foundations of Offline Reinforcement Learning

$500,000FY2022CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2). Reinforcement learning (RL) is a subarea of Artificial Intelligence (AI) that solves complex decision-making tasks. It has achieved impressive successes in simulator-defined problems, where the RL agent learns via trial-and-error inside a virtual "online" environment. However, it is difficult to apply these online algorithms to real-world problems, as trial-and-error is often expensive or impossible in real life. For example, it is unethical for an RL agent in personalized medicine to test a new treatment strategy that may harm patients, just for the purpose of gathering new information. A promising paradigm to addressing this issue is offline RL, where the agent learns solely from historical data. While the lack of direct interactions with the real environment prevents undesirable real-world consequences, it also gives rise to significant technical challenges in learning. This project aims to develop novel methods to address these challenges and provide a deep theoretical understanding for offline RL, and make significant progress in enabling offline RL in real-life applications such as robotics, adaptive medical treatment, and online recommendation systems. The research development will also be integrated into the project's educational plan, which includes advising students and developing new courses and a monograph on reinforcement learning. The technical aims of the project consist of two thrusts. The first thrust focuses on the problem of model selection: after training is completed, how should we select between candidate policies on a holdout dataset? Model selection enables hyperparameter tuning, which is the backbone of practical machine learning, yet it is notoriously difficult in offline RL due to the multi-stage nature of the problem. The proposal describes a promising approach that builds on the investigator's recent theoretical work on value-function selection. The project will devise empirically effective methods based on the theoretical insights and address practical issues such as poorly fitted candidate functions and data with insufficient coverage. The second thrust considers the theoretical foundation of offline RL training: under what conditions can we guarantee the success of training? The proposal lays out the theoretical landscape of offline-RL training, and identifies important open questions and opportunities for discovering novel theoretical and algorithmic insights. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →