CIF: Small: How Much of Reinforcement Learning is Gradient Descent?

$301,244FY2023CSENSF

Trustees Of Boston University, Boston

Investigators

Abstract

In the past decade, reinforcement learning has achieved remarkable success in a wide range of applications, from games such as chess and go to advanced applications such as chip design and aerial navigation. There is now ample evidence that reinforcement learning represents one of the most promising research directions to deliver the next generation of autonomous systems. However, many popular reinforcement-learning methods often fail to converge, making the use of reinforcement learning in practice more an art than a science. This project will explore a novel approach to analyzing and designing convergent reinforcement-learning methods based on a recently discovered connection to gradient descent. This connection will not only improve the analysis of existing algorithms but also lead to the development of new methods. This project builds on a novel concept, gradient splitting, which allows classical reinforcement-learning methods to be viewed as modifications of stochastic-gradient-descent updates, which inherit many key properties of gradient descent. We will use this connection to develop variations of temporal difference learning and Q-learning which, when given a dataset sampled from a Markov decision process, will converge geometrically to the statistically optimal estimate of the true value function. Coupled with neural-network approximation, our methods will approximate the true value function with an additional error that is inversely proportional to a power of the width of the underlying neural network. These results will then be used to develop a provably convergent neural actor-critic method. The new methods we will develop will not only provide rigorous bounds on the performance of neural networks in reinforcement learning but also will result in significantly faster training times compared to existing methods. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →