GGrantIndex
← Search

Statistical Tools for Whole-Genome Analysis & Prediction of Complex Traits and Diseases

$309,500R01FY2016GMNIH

Michigan State University, East Lansing MI

Investigators

Linked publications & trials

Abstract

? DESCRIPTION (provided by applicant): The analysis of big genomic data requires specialized software able to cope with challenges emerging from both the high dimensional nature of the data itself and the complexity of the underlying biological mechanisms. With NIH support we developed, tested and now maintain the Bayesian Generalized Linear Regression R-library (available at CRAN, BGLR, Pérez and de los Campos 2014): a comprehensive Bayesian statistical software that implements a large collection of Whole-Genome Regression (WGR) procedures, including shrinkage and variable selection methods for linear models and semi parametric regressions (RKHS). Several studies that have used BGLR for analyses of large genomic data sets (with hundreds of thousands of SNPs and thousands of individuals) as well as multi-layer omic data demonstrate the value of the software. For the renewal of our grant we propose a set of improvements and developments that will make BGLR better suited for the analysis of Big Data and will greatly expand the classes of models implemented. We will develop and implement: (Aim 1) methods to enable BGLR to carry out computations using inputs that are stored in distributed binary files, without fully loading data into RAM-this will open great opportunities for the analysis of big omic data sets; (Aim 2) a BGLR module to fit a diverse array of interaction models, including interactions between categorical (e.g., sex, treatment) or quantitative (e.g., BMI) risk factors with whole- genome data (e.g., SNPs, expression profiles); (Aim 3) methods to incorporate prior information (e.g., annotation) into whole genome regressions; and, (Aim 4) instruments for online training. The successful achievement of our aims will provide researchers with efficient data analysis tools for whole-genome analysis of large omic data sets.

View original record on NIH RePORTER →