RI: Small: Collaborative Research: Hidden Parameter Markov Decision Processes: Exploiting Structure in Families of Tasks

$208,000FY2017CSENSF

Brown University, Providence RI

Investigators

Abstract

Part 1 Machine learning has the potential to automate many complex, real-life tasks. However, learning algorithms typically require a substantial amount of data from each specific task they are asked to solve, requiring repeated interactions with the world, each of which take time and effort. Many real-life learning scenarios involve repeated interactions with tasks that are similar, but not identical. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs - each has a similar disease but a different progression, requiring individualized treatment; a robot may have to manipulate objects of different size and weight - each requiring similar but not identical grasping strategies. In such cases treating all of the tasks as the same results in poor performance, but learning to solve each as if they were completely different takes far too long. This project will develop intelligent agents that can use knowledge gained when solving prior tasks to much more rapidly learn new tasks that are similar but not quite the same. The principal technical component of this project will lie in rigorously defining what it means for tasks to be related and in producing algorithms for leveraging that definition to enable rapid learning. To do so, the project will introduce the Hidden-Parameter Markov Decision Process, which models a family of tasks through a parameter which describes variation through the family but is hidden from the learner. The project will investigate methods that exploit this structure by learning a model of task variation and then seeking to identify the parameter value for each specific task. The planned work will focus on healthcare applications, where families of related but distinct tasks are common (i.e. each patient will have unique characteristics). However, the project aims to produce foundational learning algorithms applicable to many application areas, ranging from robotics to systems design. This research will also be integrated into the courses taught by the PIs at Harvard and Brown and made available online; the PIs will include a diverse population, including REUs, both in these classes and in their research groups. Part 2 Many real-life learning scenarios involve repeated interactions with tasks that have similar, but not identical, dynamics. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs; a robot may have to manipulate objects of different size and weight. These cases describe a family of related tasks, each of which is similar but not quite the same. An intelligent agent should be able to transfer knowledge learned during previous experiences to rapidly solve new tasks in the same family. However, while many algorithms have been developed to transfer knowledge, the lack of a model of task relatedness inhibits our ability to formally understand the benefits of such algorithms or the structure they exploit. The planned work will model such scenarios by embedding the tasks on a low dimensional manifold that captures relevant variation between instances. Each location on this manifold (unobserved by the agent) describes a task instance, forming a sufficient statistic for solving the task in the context of the task family. Preliminary work by the PIs has shown that it is possible to learn such a manifold after solving just a few individual task instances and enable the rapid optimization of policies for new task instances. Building on these promising initial results, the PIs plan to: 1) Develop methods for task family characterization, by determining whether a collection of tasks can be modeled via a single manifold or consists of several clusters; whether a new task belongs to an existing cluster or manifold; and if so, and whether or not transfer is worthwhile. 2) Scale inference by adapting recent results from machine learning to deal with large state and action spaces. 3) Generate policies using Bayesian reinforcement learning algorithms, and by exploiting formal links between state and policy representations. In addition to synthetic domains, progress on these directions will be applied to problems of treatment optimization for patients with HIV, sepsis, and depression via clinical collaborations that the PIs have with world-experts in these diseases.

View original record on NSF Award Search →