CRII: RI: Inference for Probabilistic Programs: A Symbolic Approach

$174,639FY2017CSENSF

University Of California-Los Angeles, Los Angeles CA

Investigators

Abstract

Probabilistic machine learning and artificial intelligence have revolutionized the world and are present in most aspects of our life. However, the tools used to develop probabilistic machine learning solutions are limited in what they can express. Moreover, they require significant expert knowledge, and are not accessible to scientists in each discipline, let alone everybody else. Probabilistic programming aims to make probabilistic machine learning accessible to all, and as easy to program as a phone application. To make this dream a reality, probabilistic program execution, making probabilistic predictions from observations, has to become as highly efficient and robust as our current non-probabilistic software tools. This project develops general-purpose algorithms to execute probabilistic programs efficiently, using advanced symbolic reasoning techniques from artificial intelligence. Moreover, it does so for probabilistic programs that are significantly more complex than the ones in use today, involving a wide range of programming language features that are both discrete and continuous. This increase in scalability and expressive power will foster novel, increasingly advanced machine learning applications. More specifically, probabilistic programs subsume classical probabilistic graphical models and are additionally able to capture complex probabilistic dependencies that include arbitrary pieces of executable code. While many expressive probabilistic programming languages have been proposed in recent years, the current bottleneck and barrier to success is the lack of general-purpose reasoning algorithms to perform inference with probabilistic programs efficiently. This research tackles two key problems in probabilistic program inference. First, current sampling-based algorithms have problems reasoning about dependencies between large numbers of discrete random variables and explaining low-probability observations. In one thrust, this project develops new inference algorithms based on knowledge compilation. This technique compiles the program into a symbolic structure that is efficient for probability computation. The algorithm does not compile the entire program, which is generally intractable, but uses importance sampling on partially compiled programs to sample efficient subprograms. This combines the best of approximate program evaluation by sampling with highly efficient compilation techniques for exact inference. Second, symbolic approaches to inference are fundamentally discrete and have problems dealing with continuous and integer variables, which frequently appear in real code. Conversely, algorithms for continuous distributions cannot efficiently handle discrete program structure. In another thrust, this project studies symbolic approaches to probabilistic reasoning in programs with both types of structure, using recent breakthroughs based on satisfiability modulo theories and hashing-based sampling. This project provides a scientific leap at a fundamental level. It also provides a context for training undergraduate and graduate students in subjects spanning machine learning, artificial intelligence, statistics, and programming languages, and targets the integration of probabilistic programming into computer science curricula.

View original record on NSF Award Search →