Structural Learning and Statistical Inference for Large-Scale Data

$119,998FY2020MPSNSF

University Of Wisconsin-Madison, Madison WI

Investigators

Abstract

This project aims at developing new structure learning and statistical inference procedures for capturing some essential structures of large-scale data, emerging from scientific studies in genetics, biology, neuroscience, finance, and meteorology, among others. New tools for stochastic modeling, computational algorithms, and statistical inference applied to multi-channel brain EEG recordings, multi-subject fMRI and multiple neuron spike trains in neuroscience research, and identifying structural changes in climate data and copy number variation in genetics, will be developed. Outcomes of this study will help scientists to efficiently analyze large-scale imaging, temporal and spatial data, and thus will have broader impacts on our society through their direct impacts on these applications to science, public health, and information technology. Dissemination of these developments will enhance new knowledge discoveries, and strengthen interdisciplinary collaborations. The research will also be integrated with educational practice through designing either regular, seminar or short courses on new statistical approaches for analyzing complex data as well as benefitting the training and learning of undergraduate, graduate students and underrepresented minorities. This research work focuses on statistical learning of fundamentally distinct types of structures, with the ultimate goal of better understanding of complex systems. Motivated from inferring neural connectivity from the ensemble neural spike train data, Project 1 will learn the directed acyclic graph structure in a large Poisson network, underlying a wide array of multivariate point process data. The related probabilistic mechanism will provide new insights into understanding statistical properties of the estimators for graph parameters relevant to mining the causal relation among neurons. Inspired by feature extraction and source separation from multi-channel brain EEG recordings and non-linear temporal signal processing, Project 2 will develop a class of non-linear non-smooth combinations of structured component analysis (SCA) to extract hidden component signals from observed mixed signals. The SCA developed will be more broadly applicable in scientific studies. Motivated from identifying and understanding structural changes in climate trends, and structural variation in gene copy numbers associated with genetic diseases, Project 3 will develop a novel two-step adaptive procedure of jump detection, for simultaneously selecting the unknown number of jump points and detecting their locations in the flexible non-parametric regression model. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →