Fast and powerful extensions of mixed model methods for GWAS

$56,118F32FY2016HGNIH

Harvard School Of Public Health, Boston MA

Investigators

Linked publications & trials

Paper 27694958 Paper 27270109 Paper 26924531 Paper 26562831 Paper 26544803 Paper 26523775 Paper 25892111 Paper 25642633 Paper 25642630

Abstract

DESCRIPTION (provided by applicant): Genome-wide association studies (GWAS) have improved our understanding of the genetic architectures of many complex diseases and hold the promise of identifying genomic loci of causal variants and enabling accurate genetic risk prediction. However, because most traits of medical interest are influenced by a multitude of genetic factors, each of which explain only a small fraction of heritability, cohort sizes on the scale of hundreds of thousands of individuals will be necessary to provide the statistical power required to detect these elusive associations. This proposal aims to develop fast and powerful statistical methods addressing key challenges that arise in modeling such large-scale data sets: correcting for subtle confounding from population stratification or cryptic relatedness among study participants while maintaining computational tractability. The current state of the art approach to association testing uses linear mixed models to simultaneously model the effects of all markers while accounting for sample structure. Existing mixed model techniques are computationally expensive, however, and also assume that all markers have nonzero effects. This proposal aims to extend mixed model methods by developing and implementing a new well-calibrated mixed model statistic that can be computed very quickly and tailored to more realistic genetic architectures. The first specific aim is to develop a novel method that analyzes linkage disequilibrium patterns to calibrate mixed model association test scores, distinguishing genome-wide inflation of test statistics due to sample structure from perceived inflation that is actually the true result of many causal loci. This method will safeguard against the alternative dangers of false positive associations from confounding or power loss from overly conservative calibration. The second aim is to develop a fast algorithm that applies modern iterative methods for numerical linear algebra to reduce the computational complexity of mixed model association testing to linear in the data size. This advance will enable mixed model analysis to remain feasible as study sizes increase, unlocking associations from rare or small-effect variants. The third aim is to extend the method to model genetic architectures in which most markers have no disease association - as is widely believed - thereby improving statistical power. All of these techniques will be validated in simulation, implemented in software released to the scientific community, and applied to real GWAS data sets to search for additional associations that reach significance.

View original record on NIH RePORTER →