CIF: SMALL: Theoretical Foundations of Partially Observable Reinforcement Learning: Minimax Sample Complexity and Provably Efficient Algorithms

$483,663FY2023CSENSF

University Of California-Berkeley, Berkeley CA

Investigators

Abstract

Many reinforcement learning (RL) systems operate within environments that provide only partial observations and imperfect information to the agents. Despite notable empirical success, partially observable RL models still present considerable theoretical challenges, potentially posing significant risks to sensitive tasks. This project will design efficient learning algorithms and provide sharp sample complexity analyses for partially observable RL systems. The theoretical tools will build on a broad range of subjects, including machine learning, information theory, control theory, and high-dimensional statistics. The developed results will have impact on a variety of applications such as robotic control, autonomous driving, and strategic games. The investigator is committed to fostering diversity by actively recruiting and training students, particularly those from underrepresented minorities and women in Science, Technology, Engineering, and Math (STEM). This project will tackle the theoretical challenges in learning two partially observable RL models: partially observable Markov decision processes (POMDPs) and extensive-form games (EFGs). The main goal is to provide theoretical tools and new insights to developing algorithms and proving sharp statistical complexity bounds. The first component will focus on POMDPs, with the goal of closing the sample complexity gap of learning in the basic tabular setting and addressing the computational challenges by identifying structural conditions that admit planning efficiency. The second component will focus on EFGs, with the goal of designing near-optimal algorithms for three types of regret: external regret, Phi-regret, and dynamic regret. The proposed algorithms and sharp statistical complexity bounds will provide a solid theoretical foundation for future research of RL theorists and practitioners. These algorithms will be coded and tested within the OpenSpiel environment to evaluate their empirical performance. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →