New Computational Approaches for Markov Decision Processes

$365,643FY2004ENGNSF

University Of Maryland, College Park, College Park MD

Investigators

Abstract

Developing practical computational solution methods for large-scale Markov Decision Processes (MDPs), also known as stochastic dynamic programming problems, remains an important and challenging research area. The complexity of many modern systems that can in principle be modeled using MDPs have resulted in models for which it is not possible to explicitly enumerate the transition probabilities, but for which sample paths can be easily generated, e.g., via a stochastic simulation model. The project research addresses two other distinct but crucial issues that arise: how best to allocate a computational budget that is used to generate sample paths, and how to produce a robust set of good policies directly (rather than indirectly via value function approximations). In particular, the main thrusts of our proposed approaches center on two distinct paradigms: effective sampling-based methodologies using multi-armed bandit models and induced correlation for value function estimation; and population-based approaches for finding improving policies, in contrast to the traditional policy iteration method, which iterates on a single policy. The latter thrust will focus on infinite horizon problems, where there is assumed an optimal stationary policy, whereas the former approaches are intended for finite horizon problems, where backwards induction dynamic programming must be employed. Algorithms will be developed and then analyzed in terms of their properties such as convergence rate and theoretical bounds on performance, followed by testing on specific application areas to investigate their practical utility. Specific problem domains include the pricing of American-style financial derivatives; capacity planning and preventive maintenance in manufacturing systems; and communication networks.

View original record on NSF Award Search →