GGrantIndex
← Search

Unsupervised and Semisupervised Heterogeneity Analysis Based on Gaussian Graphical Models

$199,661FY2022MPSNSF

Yale University, New Haven CT

Investigators

Abstract

Many complex diseases such as cancer are heterogeneous, with seemingly similar patients having different clinical behaviors and varying responses to treatment. To better understand disease biology and more effectively describe and treat diseases, it is of essential importance to accurately model disease heterogeneity, which has been made possible by the fast accumulation of omics data. The existing studies are limited by analyzing simple data distributional properties. This project will advance the paradigm of disease heterogeneity analysis by accommodating how omics measurements are connected. Additionally, the investigator will comprehensively study multiple data scenarios, including when disease outcome (for example, survival) is completely unknown or known for some patients, and when additional data (for instance, on demographics and clinical history) is also available. The investigator will develop a set of leading-edge statistical methods and conduct rigorous theoretical and numerical investigations to compare with existing approaches. This project will fundamentally advance multiple subfields of statistics, including heterogeneity analysis, analysis of high-dimensional data, model selection, and optimization with high-dimensional data. Equally importantly, applications of the developed methods will lead to more accurate identification of heterogeneous patient groups and their omics characteristics for multiple cancer types. This will facilitate the identification of disease subtypes, treatment selection, and prediction of disease paths, having a direct and profound impact on clinical decision-making. Taking advantage of TCGA (The Cancer Genome Atlas) data, the investigator will deliver important heterogeneity models for lung and skin cancer, valuable to basic science and clinical researchers. Additionally, this project will benefit the education and training of undergraduate and graduate students at Yale University, and foster additional collaborations. Heterogeneity analysis plays an important role in statistics and biomedicine. The development of high-throughput profiling has made it possible to conduct more informative analysis but has also brought numerous statistical challenges. Many commonly used methods are limited to marginal measures especially mean and variance. In this project, building on a recent successful GGM (Gaussian Graphical Model)-based heterogeneity analysis, the investigator will systematically develop GGM-based unsupervised and semisupervised heterogeneity analysis. In particular, the investigator will examine the complicated scenarios with the presence of latent effects and regulating effects as well as heterogeneity analysis under a hierarchy. A series of leading-edge methods built on the penalized fusion technique will be developed. The consistency properties of developed methods will be established under ultrahigh-dimensional settings. The project will also develop efficient computational algorithms and conduct extensive simulations and comparisons. The investigator plans to analyze the TCGA (The Cancer Genome Atlas) data on lung and skin cancer and deliver heterogeneity models along with variable selection and model estimation results. Statistical investigations under this project will broadly shed insight into high-dimensional statistics, heterogeneity modeling, penalization, and network-based analysis. Data analysis will significantly move the field of cancer omics. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →