EAGER: Formal Models of Trainer Feedback for I-Learning Theoretical Guarantees

$70,043FY2016CSENSF

North Carolina State University, Raleigh NC

Investigators

Abstract

As virtual agents and physical robots become more common, there is an increasing number of complex tasks they can usefully perform to assist humans. These tasks are typically formalized as sequential decision tasks, where robots and agents perceive states, take actions, and receive a reward feedback signal. In practice, there is a critical need to learn directly from human users---the majority of human users will not be able to directly program or fully specify a useful reward function. On the other hand, they can likely train an agent to perform tasks unanticipated by the original designer. Machine reinforcement learning (RL), a paradigm often used for solving sequential decision making tasks, was originally developed with inspiration from animal learning research from the applied behavior analysis (ABA) community. Existing RL approaches operationalize a limited set of ABA principles effectively; however, there are additional principles and properties from ABA research that are not well encapsulated in the existing RL formalisms, and that are likely sources of new inspiration for designing more effective RL techniques capable of learning from human teachers. The objective of this project is to leverage insights from animal training to reformulate the learning of sequential tasks from an agent learning alone in a fixed environment to an agent learning cooperatively with a competent, but not necessarily perfect, human teacher. Successful completion of this project will contribute a foundation of knowledge that will aide in the development new technologies to allow end users to customize the functions of their gadgets. This project is a part of a larger and collaborative effort between North Carolina State University (NCSU), Brown University, and Washington State University (WSU). The NCSU effort will include theoretical contributions along with empirical analyses and data collection. The emphasis of the NCSU portion of the project will be on the development of theoretical models of human feedback. When humans provide rewards to learning machines, describing the properties of the algorithms those machines use requires knowledge of how the humans provide feedback. For example, knowing when and how they make errors, the circumstances where they provide reinforcement or punishment, or use extinction, etc. Understanding the theoretical properties of I-Learning under different trainer paradigms will be the primary effort of NCSU project personnel. NCSU personnel will also work in concert with collaborators at Brown to use these models of feedback for describing the performance properties of I-Learning under different assumptions of trainer behavior. In addition, NCSU personnel will work with WSU collaborators to collect data from human trainers in virtual settings in order to validate and set the parameters of the theoretical models.

View original record on NSF Award Search →