Learning Complex Stochastic Systems

$200,000FY2023MPSNSF

Louisiana State University, Baton Rouge LA

Investigators

Abstract

Differential equations are often used to model temporal evolutions of a variety of systems. However, most realistic systems including those arising from biology, environmental science, engineering, physics, medicine and financial markets exhibit randomness in their behavior. Accurate analysis of such systems thus needs differential equations that can incorporate this randomness. Stochastic differential equations are powerful tools for this purpose. Understanding behaviors of these systems requires not just building mathematical models but integrating them with available data. This in turn requires various types of learning algorithms. It is important to judge the effectiveness of these algorithms by rigorous mathematical analysis, which is the primary objective of this project. The dynamics of these stochastic systems are however intricate with convoluted correlation structures, and there is a critical lack of mathematical results in the literature investigating learning methods for such complex data. The work done by the investigator will fill some of this gap by deriving mathematical results that will not only be able to answer if the algorithms become more accurate with data observed over longer periods of time but will be able to provide valuable insight on how to fine-tune the key parameters for optimal efficiency. Building such data-driven stochastic models backed by rigorous mathematics enhances our understanding of complex systems across multiple domains and empowers informed decision-making in the presence of randomness. The project will involve undergraduate and graduate students and will teach them valuable skills through a combination of theoretical knowledge, practical application, and hands-on experience with coding. It will enable them to excel in the digital age and adapt to the demands of an increasingly data-driven and technologically advanced world. The results of the project will be disseminated through publications in well-known scientific journals and presentations at domestic and international conferences. The project will study important learning problems for a broad class of stochastic differential equations (SDEs). These problems lie on the interface of stochastic analysis and statistical learning theory, and there is a paucity of theoretical results in probability, statistics and machine learning literature addressing them. The project is divided into three interconnected parts, each of which plays an important role in the other. Part I will address important problems on parametric inference including point estimation and testing of hypotheses. It will derive asymptotic results including law of large numbers, central limit theorems and large deviation principles for estimators of a finite dimensional parameter of a broad class of SDEs. Unlike some existing works in this direction which assume data to be in the form of a continuous trajectory, the investigator's work will consider the realistic case of availability of only discrete data points. Since asymptotic analysis requires the time horizon to go to infinity, the effect of time-gap (or discretization step) between the observations on the accuracy of these estimators over long time is not clear, and it is known that naive discretization of estimators based on a continuous trajectory of an SDE can lead to erroneous inference. The project will introduce appropriate scaling frameworks to quantify this effect and analyze the errors in different scaling regimes. Next, these results will be utilized to design tests for composite hypotheses-testing problems so that the probability of type I error decays rapidly and which are asymptotically uniformly powerful within a class of tests having similar level of type I error. Part II of the project concerns itself with the important topic of decision-making. Decision-making involves (constrained) minimization of suitable cost functions depending on model parameters. Since these latter quantities are unknown, data-driven versions of such minimization problems are necessary in practice. In particular, it is necessary to construct suitable estimators of the cost functions so that decisions based on their minimization are close to the true decisions. The investigator will study a novel approach based on large deviation analysis and results of Part I which aims to guarantee that under appropriate conditions this can be achieved with a very high probability. Part III is devoted to nonparametric learning of SDEs. The last part falls in the realm of infinite-dimensional learning theory where the goal is to learn the entire driving functions of the SDE-based models as opposed to estimating finite-dimensional parameters. A rigorous computational framework combining Bayesian techniques with the theory of Reproducing Kernel Hilbert Space will be developed toward this end, and the theoretical properties of the resulting learning algorithms will also be studied. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →