Collaborative Research: Towards the Foundation of Approximate Sampling-Based Exploration in Sequential Decision Making

$300,000FY2023MPSNSF

University Of California-Los Angeles, Los Angeles CA

Investigators

Abstract

Sequential decision-making problems, such as bandits and reinforcement learning, play a crucial role in various AI applications, including recommendation systems, robotics, games, and personalized healthcare. The main challenge lies in finding the optimal exploration strategy that strikes a balance between choosing actions with the best performance and choosing actions with high uncertainties. However, existing exploration strategies heavily depend on specific cases, requiring prior knowledge of reward distribution, function approximation, and the task at hand. This creates computational obstacles and hampers real-world applicability. This project aims to establish a theoretical foundation for using approximate sampling-based techniques to unify exploration strategies across different sequential decision problems. The goal is to develop efficient and provable algorithms applicable to diverse learning problems under a unified algorithmic framework based on approximate sampling. This project also provides research training opportunities for graduate students. The project consists of three tasks. Task one focuses on developing fast approximate sampling-based exploration strategies for contextual bandit problems, accompanied by theoretical guarantees. Task two involves implementing and generalizing these exploration algorithms to more complex sequential decision-making applications, leveraging deep neural networks. Task three aims to establish efficient and provably effective exploration strategies for reinforcement learning problems. These advancements will be translated into accessible tools for various bandit and reinforcement learning applications, providing verifiable guarantees. The open-source software and course materials resulting from this project will be made publicly available, benefiting research, education, and society at large. This award by the Division of Mathematical Sciences is jointly supported by the NSF Office of Advanced Cyberinfrastructure. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →