CAREER: Non-asymptotic, Instance-optimal Closed-loop Learning

$506,915FY2022CSENSF

University Of Washington, Seattle WA

Investigators

Abstract

Machine Learning and Artificial Intelligence can recognize and exploit hidden patterns in data in order to predict future outcomes in applications ranging from content recommendation to personalized medicine. However, there are many problem areas where collecting the data is time-consuming (e.g., cells need to grow in lab environments) or expensive (e.g., special materials or expert opinions are required). Ideally, in order to reduce the amount of data needed to reach conclusions, already-collected data can be leveraged to guide the selection of future measurements in a closed-loop manner. While the behavior and benefits of some closed-loop data collection strategies are well understood in simple settings, this family of strategies is not commonly employed in real-world scientific laboratories or in medical trials due to a lack of predictability and accuracy of the outcomes. This project seeks to make foundational contributions to the understanding of closed-loop learning strategies with a view towards designing new data-collection strategies that are both effective and reliable. In practice, this may lead to requiring fewer patients in a clinical trial or to halving the time to identify a disease-curing drug. The investigator also plans to engage with high-school students and machine-learning enthusiasts alike to increase their level of awareness around data collection -- for instance, how even a simple survey, if not carefully designed, can result in privacy violations, demographic under-representation and bias of many forms, all of which may lead to inaccurate conclusions. For many problems of interest in closed-loop learning, prior art has focused only on minimax optimality, where the sample complexity of the worst-case problem instance is minimized. This approach leads to algorithms that are significantly inferior on "easy" or benign instances that may occur in nature but which are far from adversarial. In contrast this project will study the fundamental limits of instance-optimal sample complexity for problems of interactive learning and reinforcement learning in the Probably Approximately Correct (PAC) setting. The insights to be gained will be applied towards the design of algorithms that automatically adapt to the intrinsic difficulty of the particular problem instance being faced, be it benign or not. The proposed approach is motivated by the observation that the instance-optimal sample complexity decomposes into an asymptotic term, which is by now well characterized, and a moderate-confidence term, which is known to dominate the asymptotic term for all practical purposes. As the properties of the latter term are still poorly understood, lower bounds for it will be constructed together with algorithms that nearly achieve them. Such results will lead to algorithms that greatly reduce the overall instance-optimal sample complexity and vastly improve upon state-of-the-art algorithms that tend to cater to worst-case scenarios. The efforts will initially focus on structured linear bandits and reinforcement learning in the tabular and linear-function approximation settings. While these paradigms are of wide applicability to practice, they also have enough complexity to allow insights to be extrapolated to more generic closed-loop learning paradigms. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →