GGrantIndex
← Search

Genomics, GPUs, and Next Generation Computational Statistics

$382,969R01FY2015HGNIH

University Of California Los Angeles, Los Angeles CA

Investigators

Linked publications & trials

Abstract

? DESCRIPTION (provided by applicant): With the size of genetic data sets and their computational demands growing exponentially, concerns are rising whether traditional statistical approaches and standard CPUs can deliver the needed analytical and computing power. Parallel computing has been touted for several years, but massively parallel CPU computers are enormously expensive and limited to a few national centers. Graphics processing unit (GPU) and many integrated core (MIC) coprocessors offer a far cheaper and more distributed solution. Each GPU or MIC card can run hundreds of computational threads simultaneously, and several cards ¿t inside a desktop computer. Today, almost all new laptop and desktop computers are equipped with multiple CPU cores and some GPU coprocessor. Thus, cheap hardware currently exists that promises a hundred-fold speedup of many basic computational procedures. Appropriate algorithm design and software development is the main hurdle hindering the exploitation of GPUs and MICs. This proposal targets this weak link in the chain of modern computing. By demonstrating the advantages of massively parallel processing on a few genetic problems, and by distributing general low-level software libraries for these and many other problems, we hope to catalyze the use of GPUs and MICs in genetics. The specific projects include: use of RNA-seq data for the discovery and analysis of isoforms, pedigree-informed genotype imputation, and analysis of pathogens' phenotype evolution. High-dimensional optimization is a common thread enabling these applications. We will pursue a promising new technique for optimization that is particularly well adapted to high dimensions and parallelization, the proximal distance algorithms. This procedure avoids major pitfalls of current state of the art methods, especially shrinkage, which distorts parameter estimates and model selection. Implementation of our demonstration projects on GPUs and MICs will require the production of subroutines of considerable general value in computational statistics. We intend to release our toolbox libraries to the open source community, including C/C++, Fortran, and R software wrappers. This may lead to a multiplier effect that will improve the computing climate in many disciplines through- out the health and physical sciences. All other application programs produced under this proposal will be freely distributed to the scientific community. Our record of producing and distributing usable parallel software with superior documentation shows our commitment to this philosophy.

View original record on NIH RePORTER →