Estimation and Inference for Massive Multivariate Spatial Data

$102,683FY2018MPSNSF

Cornell University, Ithaca NY

Investigators

Abstract

Satellite observations of the Earth's atmosphere and oceans have the potential to improve forecasting of hurricanes and other extreme weather events. Massive efforts to sample the chemical constituents present in well water can reduce uncertainty in mapping of hazardous materials in groundwater. Observations of chemical reactions at the sub-micron scale may lead to new insights about the behavior of toxic trace elements in soils. However, the value of these expensive efforts to collect massive amounts of data will not be fully realized if the statistical techniques for analyzing them do not keep pace. The current techniques available are inadequate to flexibly model and extract information from massive datasets consisting of many variables collected across a region. This research project aims to develop computationally efficient methods for addressing the central challenges for analyzing massive multivariate spatial data: (1) drawing justifiable conclusions about the relationships among the multiple variables, and (2) making full and appropriate use of all variables when mapping the data. Addressing the first challenge is essential to translating observational and experimental data into scientific knowledge. Addressing the second is crucially important for providing predictions of potentially harmful outcomes, and the key to solving both challenges is integrating the multivariate and spatial data analysis into a unified framework. The inherent correlation in time series and spatial data is the feature that makes interpolation and forecasting possible, but it also complicates estimation of multivariate relationships. As a result, analyses of time series data often start with a transformation of the data into the spectral domain, in which the transformed data are approximately uncorrelated. Although the spectral domain has played a central role in developing theory for models for spatial data, several issues have hindered the implementation of practical spectral domain methods for spatial data. This project aims to develop methodological innovations to overcome those barriers and provide practitioners with a flexible set of tools to extract information from dozens of spatial variables simultaneously, and predict variables at unsampled locations using all of the available data. The methods employ computationally efficient periodic data augmentations to simplify analyses, dramatically improve the ability to characterize uncertainty, and are supported by novel theoretical results.

View original record on NSF Award Search →