GGrantIndex
← Search

Collaborative proposal: Variable Selection in the high dimensional, low sample size setting -- Beyond the Linear Regression and Normal Errors Model

$200,000FY2016MPSNSF

Cornell University, Ithaca NY

Investigators

Abstract

Revolutionary new technologies are producing high-throughput biological data at a resolution that was unthinkable only a decade ago. These new forms of data pose enormous challenges and opportunities for statisticians and computer scientists. This project develops new sophisticated statistical methods and computational algorithms for analyzing and integrating complex high-dimensional data. The work is motivated by collaborations with leading biological scientists at Cornell-Ithaca and Weill Cornell Medical College working in diverse research areas including plant biology, nutrition, neurology, cancer epigenomics, and veterinary medicine. The goal of this project is to develop new statistical models and computational algorithms for high-dimensional, low sample size, high-throughput biological data, including new methods for the analysis of microarrays, the identification of quantitative trait loci, association mapping, label-free shotgun proteomics and metabolomics. The proposed methods involve innovative extensions of modern statistical building blocks, including the use of random effects for regularization, shrinkage estimation, Bayesian statistics, and mixtures for posterior classification and prediction. Novel modifications of the expectation-maximization algorithm are proposed for scalable and efficient model fitting and inference.

View original record on NSF Award Search →