Collaborative Research: RI: Medium: Bootstrapping natural feedback for reinforcement learning

$1,440,000FY2022CSENSF

Massachusetts Institute Of Technology, Cambridge MA

Investigators

Abstract

Many modern applications of artificial intelligence---from industrial automation to content recommendation---depend on machine learning algorithms that train automated agents to interact with their environments. But the two main approaches to interactive learning, reinforcement learning and imitation, require so much supervision or training time that it is prohibitively expensive to apply them to most real-world problems. Human learning does not suffer from this shortcoming, in large part because humans learn not from rewards or demonstrations, but instead from extended interaction with skilled teachers who use signals like gesture and language. This project will lay a foundation for research on interactive learning with rich feedback, from the perspective of individual agents, human--agent teams, and multi-agent populations. It will yield new capabilities for interactive training of automated agents, expanding both the effectiveness and accessibility of such techniques. Support for natural, interactive feedback will also improve the customizability of such systems, making on-the-fly adaptation or retraining accessible to users without significant computing power, data annotation resources or even programming ability. The project is organized into three broad research objectives. First, it will develop a formal framework for grounding feedback, using simple supervisory signals (provided during or after execution) to bootstrap learned interpretation of more complex feedback types. Second, it will develop algorithms for learning to solicit feedback. These algorithms will turn the one-way process of reinforcement learning into a two-way interaction, enabling agents to proactively query supervisors for information about the compositional and causal structure of the environment. Third, it will develop new mechanisms and techniques for providing feedback, via software tools that assist human supervisors in selecting or generating maximally informative feedback signals. Research under each of these objectives will be carried out in simulated environments, benchmarked using complex tasks spanning navigation, robot manipulation, and furniture assembly, and evaluated in terms of its benefits to sample efficiency, end-to-end development time, and usability. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →