Optimal tests for weak, sparse, and complex signals with application to genetic association studies
Worcester Polytechnic Institute, Worcester MA
Investigators
Abstract
Detection of sparse and weak signals is a key for analyzing big data in many fields. Recent statistical research has made celebrated theoretical progress in revealing the detectability boundaries under the Gaussian means model and an idealized linear regression model. Detectability boundary illustrates the border in the two-dimensional phase space of signal sparsity and weakness, below which the signals are asymptotically too weak and sparse to be detectable by any statistical methods. Certain statistics are optimal for these models in the sense that they reach the boundary (i.e., the least requirements) for reliable signal detection. However, there are significant gaps between these theoretical models and practical meaningful models. In this project, the investigators extend statistical theory to handle weak, sparse, correlated, and interactive signals under the framework of generalized linear models. The investigators develop optimal testing procedures to address the realistic data features in genome-wide association studies and next-generation sequence studies. Statistical theory and methodology development for the detection of weak and sparse signals is foundational for analyzing big data. The goal of this project is to extend statistical theoretical study to address complex signals that are correlated and interactively influential to quantitative or categorical responses. This study is of great interest in data science and is critical to many applications. For example, one perplexing problem of current genetic studies is the missing heritability of complex traits even after many genetic factors have been identified. The proposed work specifically addresses the features of those hidden disease genes yet to be discovered. Unlike some genetic studies based on heuristic arguments, this research combines the power of rigorous statistical theory, first-hand practices in the field, and cutting-edge data from genome-wide association studies and next-generation sequence studies. The proposed project is highly promising in the hunt for the missing heritability. Highly improved gene-detection techniques will help to identify more causative genes of complex human diseases, which will lead to the elucidation of disease pathogenesis and design of targeted therapeutics, thus have a far-reaching impact on improving quality of life.
View original record on NSF Award Search →