Inference for Functionals in High-Dimensional Regression
Harvard University, Cambridge MA
Investigators
Abstract
Modern science and engineering applications involve large datasets with a multitude of variables or features. A key challenge in this context is to distinguish the scientifically relevant variables from the irrelevant ones - in other words, the signal from the noise. The challenge is compounded by subtle nonlinear relationships among these variables. Generalized linear models are the most often used tools in classical statistics for discovering such nonlinear relationships and they are routinely employed, even in contemporary big data settings. Unfortunately, classical statistical theory, traditionally used to justify the validity of these methods, fails in this regime. This project will develop novel approaches for inferring scientifically relevant parameters in the framework of generalized linear models, adapted to the setting of high-dimensional or big data. The theory developed will facilitate principled inference regarding the relations among observed variables in applications such as genomics, computational neuroscience, signal and image processing. The principal investigator will also engage graduate students in the project by mentoring them and develop courses that will incorporate results from this project. This research project will develop statistical theory and methods for inferring scientifically relevant low-dimensional functionals in high-dimensional generalized linear models, organized around two broad themes: (1) frequentist inference for signal-to-noise ratio type functionals; (2) Bayesian inference for functionals under continuous shrinkage priors. The first theme will develop novel estimators for the signal-to-noise ratio and the genetic relatedness, a generalization of the signal-to-noise ratio that measures the shared genetic basis between multiple traits in statistical genetics. The second thrust will construct data-driven credible intervals for components of the underlying signal under computationally tractable continuous shrinkage priors. Both thrusts will develop inference procedures agnostic to sparsity level of the underlying signal. To achieve this, the research will focus on the proportional asymptotics high-dimensional regime and utilize novel insights from approximate message passing theory, developed originally in probability, information theory, and statistical physics. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →