Coarse-to-fine Discovery for Genetic Association
Johns Hopkins University, Baltimore MD
Investigators
Abstract
Mendelian traits are governed by single genes, and methods to identify these genes have been remarkably successful. In stark contrast, complex traits result from multiple genetic variants that are individually neither necessary nor sufficient, often interacting with each other and the environment. Indeed, collectively, genetic variants identified to date for complex traits typically explain less than 10% of the phenotype variance. No unified computational and statistical framework has been advanced for organizing the discovery process. To date, a single strategy has dominated: static variant-by-variant analysis. In contrast, the investigators propose a new coarse-to-fine statistical framework motivated by the biomedical hypothesis that mutations contributing to a specific disease cluster in specific pathways, and in genes within these pathways. Simulations demonstrate that multi-scale, hierarchical coarse-to-fine sequential tests have greater power than conventional methods under this hypothesis. The researchers convert these heuristics into mathematics and provide a comprehensive analysis, both empirical and theoretical, of the trade-offs resulting from the introduction of carefully chosen biases about the distribution of active variants within genes and pathways. The new methods are applied to data from real genome-wide association studies (GWAS) with large cohorts to validate their utility. Knowing the genetic variants that contribute to cardiovascular disease, diabetes, autism, and other prevalent disorders would have great value in identifying drug targets, predicting people at risk, and suggesting personalized therapies. These diseases are not caused by mutations in single genes, however, but by multiple mutations that combine to disrupt multi-gene biological pathways. The investigators therefore develop a new statistical framework that begins the search for disease-risk genes at the pathway level, then sequentially narrows the search to genes within pathways and alleles within genes. Successful applications to ongoing human genetic studies involving tens to hundreds of thousands of people identify genes contributing to cardiovascular disease. More generally, the coarse-to-fine statistical framework has great value in the current era of "big data", with increasingly large data volumes calling for innovative statistical methods.
View original record on NSF Award Search →