Simulation and Inference Algorithms for Stochastic Differential Equations
University Of California - Merced, Merced CA
Investigators
Abstract
In fields such as neuroscience, biochemistry, molecular and cellular biology, statistical physics and finance, many researchers have invested time to build first-principle mathematical models. Stochastic differential equations (SDE), a subset of such models, naturally account for the combined effects of nonlinearity and noise, effects observed in many systems of contemporary interest. Such models often have large numbers of parameters that cannot be directly observed. Researchers collect data from the system they are studying and seek to use this data to infer parameters in their models. Alternatively, researchers who begin with data from a particular system typically desire forecasts regarding the future behavior of the system and, especially in scientific contexts, explanations as to why the system behaves the way that it does. In this project, the PI will develop algorithms, methods, and open-source software packages that significantly improve a researcher's ability to infer parameters in stochastic models, and to develop stochastic models with strong predictive and/or explanatory power. The new methods will naturally accommodate data with one or more pathologies, such as irregular observation times, large volume, and non-Gaussian (i.e., heavy-tailed) features. The crux of the inference problem is the computation of the likelihood function associated with time series observations of an SDE model. Prior work has shown that, among deterministic methods for this inference problem, methods based on Fokker-Planck solvers are the most accurate. For SDE driven by standard Brownian motion, density tracking by quadrature (DTQ) computes likelihoods orders of magnitude faster than comparable Fokker-Planck solvers for the same level of error. Building on this foundation, this work seeks to develop DTQ-based inference methods that are more accurate, efficient and scalable than existing techniques. The PI will generalize DTQ to Levy-driven SDE, increase DTQ's efficiency through higher-order approximations, design scalable adjoint methods to compute gradients of the likelihood, and apply these adjoint methods to maximize log likelihoods and log posteriors. The PI will also conduct a thorough comparison of SDE inference techniques, including approximate inference through methods such as expectation maximization and variational Bayes. A key part of the project is the development of software infrastructure in the form of open-source packages for use with frameworks such as R, Python, and Apache Spark. These packages will enable all interested researchers, especially those with no advanced training in stochastics, to make use of DTQ-based simulation and inference methods. All new codes developed in this project will be thoroughly documented and tested. The PIs also plan to release the software developed as open source and build a user community around the language by ensuring that interested researchers are able to contribute to the codebase of the software developed. This will allow a wider growth of the project. This aspect is of special interest to the software cluster in the Office of Advanced Cyberinfrastructure, which has provided co-funding for this award.
View original record on NSF Award Search →