GGrantIndex
← Search

Complexity to Clarity: Nonparametric Procedures that Exploit Structured Data and Models

$450,000FY2015MPSNSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

Modern research in the physical sciences involves the use of non-standard aggregate data objects (for example, images, spectra, or hurricane tracks), which are not directly amenable to traditional statistical methods. Naturally, the physical processes that generated these observed data are also often very complicated. Typically the only meaningful "model" is in the form of a high-resolution theoretical or simulation model. For example in cosmology, scientists regularly use large hydrodynamic simulations to understand how the universe formed and evolved. The goal of this project is to develop a new means of combining careful statistical modeling of scientific phenomena with scalable procedures that fully exploit the richness of large collections of complex data without reducing the data to a set of features or templates. Building on ideas from harmonic analysis and spectral methods, this project is to develop flexible and adaptive nonparametric methods for high-dimensional inference that exploit sparse (and potentially nonlinear) structure in complex data. These methods derive Fourier-like bases that adapt to the intrinsic geometry (e.g., submanifold structure) of the underlying data distribution, and use the empirical basis functions to estimate functions on high-dimensional aggregate objects. The methods go beyond point-estimates in prediction to nonparametric estimation of conditional densities, density ratios and likelihoods of complex, high-dimensional data. A key application of these methods is the calibration of complex simulation models: the inference challenge of determining the settings of input values to these models so that their output approximates either real data or the output of a more complex and computationally-intensive simulation code. On a broader scale, this work will make key methodological contributions to building, interpreting, and using probability models for high-dimensional, complexly-structured data in a wide range of scientific applications.

View original record on NSF Award Search →