Powerful and Adaptive Statistical Methods for Sequencing Studies

$75,750R03FY2016HGNIH

North Carolina State University Raleigh, Raleigh NC

Investigators

Linked publications & trials

Paper 31929665 Paper 27355347

Abstract

? DESCRIPTION (provided by applicant): Next-generation sequencing (NGS) data are being increasingly generated over the last few years. Encompassing the full spectrum of genomic variations, they hold the promise of identifying new sources of heritability from rare variants tha were eluded in traditional genome-wide association (GWA) studies. Despite substantial progresses in recent years, current methods are, nonetheless, limited in terms of power and robustness towards the analysis of NGS data that are characterized by extreme high dimensionality and low minor allele frequency (MAF). New methods are needed to adapt to these statistical challenges in order to achieve the full potential of NGS data in identifying genetic variations contributing to missing disease heritability. The goal of this project is to develop powerful and adaptive statistical methods for the analysis of sequencing studies. Speci?cally, the project aims to (1) develop an adaptive variants screening procedure that can ef?ciently account for a large proportion of causal rare variants while signi?cantly reducing te data dimension for follow-up analysis; and to (2) provide an objective procedure for samples-size calculation to direct follow-up studies and to pinpoint the causal variants with high con?dence. The proposed procedures are very general and can accommodate a wide spectrum of models, test statistics, and data scenarios. They are completely data-driven and can automatically adapt to the underlying sparsity of the data. Moreover, the proposed methods are computationally ef?cient under extreme high dimensionality. These desirable properties make the proposed methods applicable to a myriad of high-dimensional applications. Rigorous theory will be developed to understand the role of sparsity and extreme high dimensionality in NGS data analysis, and comprehensive simulations will be performed to study the proposed methods. In addition, this project will provide computationally ef?cient programs and evaluate the methods using several recent NGS datasets. The programs will be developed in R and ef?cient Fortran languages. Our computational package will be made publicly available to allow investigators to apply our procedures widely in sequencing studies.

View original record on NIH RePORTER →