Multi-armed Bandit Problems with Covariates
University Of Minnesota-Twin Cities, Minneapolis MN
Investigators
Abstract
Multi-armed bandit (MAB) refers to a class of sequential decision making problems where in each step one needs to choose a population from which a random reward will be generated. The goal is to maximize the total accumulated reward. The literature on MAB, with few exceptions, ignores available covariates. In this project, the PI will study MAB with covariates in general frameworks and develop methodologies as well as theories for various applications. The project will 1) provide methods for selecting key covariates; 2) establish consistency in variable selection; 3) establish consistency of the allocation rule in terms of the accumulated reward; 4) derive the rate of convergence of the accumulated reward relative to the oracle choices. In addition, nonparametric estimation of the mean reward functions and model combinations will be utilized for achieving higher expected reward. Strategies that simultaneously achieve high expected reward and also provide sufficient information for identifying the best arm (with high probability) will be sought. In practice of medicine, treatments previously shown to be the best at population levels in clinical trials are given to new patients with minimal consideration of his/her own personal characteristics such as genetic profile. If practically feasible, there is every reason for a patient to be treated in a way that the outcomes of all previous treatments of patients with the same disease will have been taken into account and consequently the most promising individualized treatment is selected based on genetic information, clinical assessments, and all the accumulated trial/treatment results. The proposed research will set up statistical frameworks and build theories and methodologies for application of individualized medicine using the statistical machinery of sequential allocation with covariates. Besides medicine, sequential allocation has applications in operations research, industrial engineering, economics and other fields. Due to the ease of getting and processing information furnished by the exponential growth of modern technology, with new research to bring effective use of key predictors, applications of sequential allocation with covariates will make a real impact, saving lives, improving health, promoting business, and reducing operating cost for the society.
View original record on NSF Award Search →