GGrantIndex
← Search

EAGER: Training A Mobile Robot from Human Feedback via Income Learning

$70,000FY2016CSENSF

Brown University, Providence RI

Investigators

Abstract

As cyberphysical systems become more widespread, there is an increasing number of complex tasks that they can usefully perform to assist human users. Tasks are typically formalized in the sequential decision framework, where the learner perceives states, takes actions, and receives a reward feedback signal. In practice, there is a critical need to learn directly from human users if such machines are to accomplish tasks outside of those pre-specified by the original developers. This project will develop new algorithms that can learn more effectively from humans. We will evaluate these algorithms in both virtual agents and on robot platforms. We will investigate whether and how non-expert humans can construct sequences of tasks of increasing difficulty, similar to how expert animal trainers shape tasks. Insights from these user studies will be leveraged to further improve our algorithms' abilities to learn from human trainers. Once successful, this project will make critical progress towards allowing non-technical users to be able to teach virtual and physical agents to perform complex tasks in a natural setting, familiar to many from previous experience in training household pets. This project is a part of a larger effort between Washington State University (WSU), North Carolina State University, and Brown University. The Brown effort will focus on deriving a well-motivated learning algorithm (tentatively called "I-learning") and understanding its theoretical properties. Of particular interest is the behavior of these algorithms in settings that are well studied in the reinforcement-learning community such as Markov decisions processes, k-armed bandit, and learning with function approximation. Algorithms will be implemented and tested on virtual and physical platforms (robots) and broader impacts on education and control will be pursued.

View original record on NSF Award Search →