AF: Medium: A High Performance Computing Foundation to Whole-Genome Prediction

$774,000FY2015CSENSF

University Of Connecticut, Storrs CT

Investigators

Abstract

The premise of personalized medicine is based on prediction of an individual's genetic risk to disease. Modern animal and plant breeding programs select individuals or lines based on genotypic information which circumvents the costly process of progeny testing, leading to greater efficiency. In these scientific areas, the ability to translate genotypic information into a quantitative prediction of the risk to disease or breeding targets is a matter of utmost importance. To address the technical barriers in the prediction using a whole-genome sample of genetic markers, there is urgent need for new statistical models and high performance computing foundations that allow the concurrent use of millions of genetic markers and a large variety of variables describing a disease (or a breeding target). This project proposes to solve several such barriers by an integrative approach combining and developing techniques for data reduction, parallel computing and Bayesian inference. This interdisciplinary project provides educational opportunities for graduate and undergraduate students to get first-hand research experience in computational aspects of genomics data analysis. This project aims to understand how genome-wide markers help to predict not-yet-specified phenotypes of individuals and how the total genetic contribution can be better estimated for a phenotype. The primary goals of the proposed research are to develop: (1) parallel algorithms to reduce data that comprises millions of genetic markers into lower dimensions; (2) sparse predictive modeling with correction for the uneven tagging issue due to linkage disequilibrium; (3) fast algorithms for multi-locus mapping problems; and (4) collaborative prediction methods to jointly predict multiple phenotypes. The proposed solutions will be tested in the analysis of large-scale biological data, including a dairy cattle database collected by US Department of Agriculture and a dataset aggregated from multiple genetic studies of human diseases. This project will yield user-friendly software tools that can be broadly deployed to biological research areas that study genetics of complex phenotypes. The validated methods and software will be disseminated through the PI's laboratory website.

View original record on NSF Award Search →