CAREER: Foundations of Reinforcement Learning under Partial Observability

$500,000FY2023CSENSF

Princeton University, Princeton NJ

Investigators

Abstract

A wide range of modern artificial intelligence challenges can be cast as Reinforcement Learning (RL) problems under partial observability, in which agents learn to make a sequence of decisions despite lacking complete information about the moment-to-moment situation in which decisions are made. Natural applications of this kind of Partially Observable RL (PORL) include robotics, autonomous driving, imperfect information games, resource allocation under partial information, planetary exploration, medical diagnostic systems. As such, PORL has been an important topic in operation research, control, and machine learning. While the community recently witnessed a surge of breakthroughs in reinforcement learning theory in fully observable environments, our understanding of learning to act in partially observable systems remains very limited. Partial observability brings a new series of unique challenges to RL in modeling, algorithm design, and theoretical analyses. Resolving these challenges will have far-reaching impacts in academia, industry and society where modern RL can be applied. This project aims to identify and attack these unique challenges, establish solid theoretical foundations, and design new reliable and efficient algorithms for PORL. Concretely, this proposal will study PORL in three progressive thrusts. Thrust 1 considers the basic tabular setup, under the model of Partially Observable Markov Decision Processes (POMDPs). The main objective in this thrust is to identify the key structural conditions that permit statistically or computationally efficient learning, and to address the core challenges of inferring latent states and exploration. Thrust 2 concerns modern PORL with an enormous number of states and observations, where function approximation must be deployed to approximate the models, the value functions, or the policies. We will investigate these problems under a more general model of Predictive State Representations (PSRs) and develop efficient learning results in the presence of function approximation. Thrust 3 investigates PORL in the multiagent setting, under the model of Partially Observable Markov Games (POMGs). We will design efficient algorithms for learning various equilibria in POMGs and address the unique challenges arising from multiagency and the design of decentralized algorithms. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →