Regularized Dimension Reduction for High Dimensional Data
University Of Wisconsin-Madison, Madison WI
Investigators
Abstract
With the recent advancements in biotechnology such as the use of genomewide microarrays and high throughput sequencing, regression-based modeling of high dimensional data in biological sciences has never been more important. The investigator aims to develop a regularized dimension reduction method for very high dimensional linear regression problems. The main thrust of the research is based on a well-established dimension reduction technique named Partial Least Squares (PLS) regression which has been heavily used in several scientific research areas where ill-posed problems commonly arise. The proposed work 1) theoretically investigates the suitability of PLS for very high dimensional regression settings where the number of predictors highly exceeds the available sample size; 2) proposes a regularization scheme that promotes variable selection in addition to dimension reduction; constructs rigorous mathematical formulations of the regularization scheme and characterizes their analytical solutions; 3) develops an efficient algorithm implementing the proposed framework. Extensions to interrelated classification and censored data settings are also considered. The proposed work, when completed and disseminated, will provide a powerful simultaneous dimension reduction and variable selection framework relevant for all fields of scientific research that concern high dimensional ill-posed regression problems. This will allow scientists to analyze high-dimensional data with efficient dimension reduction and increased interpretability. The PI is actively involved in collaborations with biologists, biochemists, geneticists, and medical doctors. The research emanating from this proposal will therefore have strong interdisciplinary flavor and will be implemented, tested and tuned to address many real scientific questions of interest. The PI will apply the proposed research to problems arising in studying the variation of gene expression, transcription regulation, and binding properties of DNA binding proteins, where the selection of relevant variables is as important as having excellent predictive power. The project will integrate research and education by working closely with both graduate and undergraduate students.
View original record on NSF Award Search →