Optimal Time-Reversible Policies for Markov Decision Processes

$235,078FY2010ENGNSF

University Of Virginia Main Campus, Charlottesville VA

Investigators

Abstract

The objective of this project is to develop theory and algorithms for Markov decision processes (MDPs) with reversible dynamics. Reversible Markov chains arise in numerous areas, including queueing systems, models of caching schemes, biological models, models of diffusion processes, and Markov chain Monte Carlo techniques. The methods developed in this project will yield algorithms for computing optimal control policies for stochastic control problems involving reversible Markov chains. By exploiting linear programming formulations of MDPs together with equilibrium conditions for reversible systems, optimality conditions for reversible MDPs will be established. Several classes of algorithms that seek solutions to these optimality conditions will be developed, including a form of the policy iteration algorithm and an algorithm based on simulation. The methods developed in this project may also be extended to settings where effective suboptimal policies can be obtained by appro ximating general MDPs by reversible MDPs. If successful, the results of this research will provide computationally efficient algorithms for a broad class of stochastic control problems, and might also provide insight into the structural properties of the solutions of some well known queuing problems. The algorithms to be developed can be implemented using significantly less computation and storage than conventional algorithms for Markov decision processes. One class of algorithms to be developed in this project can be implemented whenever a reversible system can be simulated, requiring very little model information beyond the output of the simulation. Additionally, this research program will be coupled with two educational activities. As part of this project, a new seminar course on the intersection of stochastic control, optimization, and game theory will be offered to graduate students at the University of Virginia. Also, a publicly available course reader will be developed for a graduate course on stochastic syste ms, emphasizing engineering applications and algorithms for analyzing stochastic models.

View original record on NSF Award Search →