Average Reward Reinforcement Learning: Scaling up

$368,658FY2001CSENSF

Oregon State University, Corvallis OR

Investigators

Abstract

This is the first year funding of a three year continuing award. Imagine a factory of the future that teems with intelligent autonomous robots and machines engaged in production. Machines that can not only sense and act, but can also optimize their own behavior without being explicitly programmed. And robots that can learn to coordinate their actions with other robots in order to satisfy an overall optimization criterion. The ability to build such machines and robots radically reorganizes the factories, so that people's role is reduced to specifying an optimization criterion and giving feedback to the machines, leaving the low level control and optimization issues to the machines themselves. This project seeks to design and study the algorithmic and computational tools necessary to build such machines and robots. The long-term scientific goal is to gain a better understanding of the tradeoffs involved in the design of adaptive autonomous multi-agent systems; in particular, the tradeoffs between the optimality of the behavior, computational and communication efficiencies, generality, and speed of learning. Optimizing the performance of programs via rewards and punishments, or reinforcement learning, appears to be the most promising approach to building such adaptive multi-agent systems for complex real-world domains. Many real-world problems in manufacturing, such as production scheduling and inventory control, are best seen as "average-reward reinforcement learning" (ARL) problems, where the optimization criterion is to maximize the average reward received per unit of time. The goal of this project is to develop scaleable algorithms, programs and techniques for solving large ARL problems, with manufacturing as the primary application domain. The PI will push the frontiers of this technology to the point where it can be applied to factories with hundreds of machines and job types, with realistic assumptions such as partial observability and scalability to multiple agents. Successful completion of this project will lead to new scaleable algorithms and programs for solving large ARL problems, which could well have significant economic impact.

View original record on NSF Award Search →