RI: Small: Collaborative Research: RUI: Batch Learning from Logged Bandit Feedback

$132,000FY2016CSENSF

Ithaca College, Ithaca NY

Investigators

Abstract

Log data is one of the most ubiquitous forms of data available, as it can be recorded from a variety of systems (e.g., search engines, recommender systems, ad placement platforms) at little cost. Making huge amounts of log data accessible to learning algorithms provides the potential to acquire knowledge at unprecedented scale. Furthermore, the ability to learn from log data can enable effective machine learning even in systems where manual labeling of training data is not economically viable. Log data, however, provides only partial information -- "contextual-bandit feedback" -- limited to the particular actions taken by the system. The feedback for all the other actions the system could have taken is typically not known. This makes learning from log data fundamentally different from traditional supervised learning, where "correct" predictions together with a loss function provide full-information feedback. This project tackles the problem of Batch Learning from Bandit Feedback (BLBF) by developing principled learning methods and algorithms that can be trained with logs containing contextual-bandit feedback. First, the project develops the learning theory of BLBF, especially with respect to understanding the use and design of counterfactual risk estimators for BLBF. Second, the project derives new learning methods for BLBF. Past work has already demonstrated that Conditional Random Fields can be trained in the BLBF setting, and the project derives BLBF analogs of other learning methods as well. Third, the project derives scalable training algorithms for these BLBF methods to enable large-scale applications. And, finally, the project validates the methods with real-world data from operational systems.

View original record on NSF Award Search →