Genomics GPUs and next generation computational statistics

$359,971R01FY2011HGNIH

University Of California Los Angeles, Los Angeles CA

Investigators

Linked publications & trials

Paper 39726826 Paper 39323739 Paper 38550750 Paper 38370787 Paper 38269952 Paper 37938576 Paper 37725902 Paper 37448596 Paper 37168541 Paper 37067496 Paper 36807492 Paper 36789288 Paper 36705562 Paper 36610401 Paper 36254789 Paper 36220195 Paper 35696417 Paper 35680274 Paper 35656342 Paper 35587600 Paper 35581624 Paper 35538057 Paper 35196515 Paper 35178111 Paper 35168826 Paper 35133398 Paper 35089317 Paper 34786731 Paper 34776610 Paper 34508652 Paper 34289008 Paper 34174829 Paper 34168419 Paper 34142722 Paper 33941078 Paper 33651796 Paper 33564781 Paper 33547645 Paper 33367811 Paper 32783620 Paper 32655193 Paper 32524061 Paper 32523233 Paper 32133245 Paper 31879980 Paper 31649491 Paper 31592195 Paper 31278682 Paper 31034053 Paper 30915546 Paper 30623484 Paper 30618485 Paper 30501857 Paper 29942656 Paper 29942419 Paper 28950376 Paper 28875524 Paper 28741177 Paper 28503249 Paper 28405027 Paper 28348854 Paper 28348500 Paper 28214848 Paper 28200071 Paper 28173504 Paper 28098392 Paper 27980643 Paper 27943406 Paper 27798403 Paper 27774287 Paper 27646141 Paper 27592566 Paper 27368344 Paper 27274051 Paper 27216439 Paper 27189542 Paper 27114697 Paper 27087770 Paper 27053974 Paper 26622074 Paper 26567478 Paper 26549920 Paper 26526428 Paper 26500711 Paper 26366044 Paper 26341298 Paper 26139633 Paper 25965340 Paper 25957468 Paper 25780554 Paper 25392563 Paper 25355432 Paper 25328363 Paper 25328261 Paper 25284823 Paper 25278604 Paper 25242858 Paper 25242834 Paper 25242816 Paper 25012181

Abstract

DESCRIPTION (provided by applicant): With computational demands in genetics growing exponentially, concerns are rising whether traditional CPUs can deliver the needed computing power. Parallel computing has been touted for several years, but massively parallel CPU computers are enormously expensive and limited to a few national centers. Graphics processing units (GPUs) offer a far cheaper and more distributed solution. Hundreds of these units are fabricated on a single card, and several cards fit inside a desktop computer. Thus, cheap hardware currently exists that promises a hundred-fold speedup of many basic algorithms. Projections from the vendors of GPUs suggest that these devices will grow rapidly in computational power and versatility over the next decade. Thus, software development is the main hurdle hindering the exploitation of GPUs. This proposal targets this weak link in the chain of modern computing. Through a series of demonstration projects and the production of low-level software libraries, we hope to catalyze the spread of GPUs in genetics. The specific projects include: 1) eQTL mapping, 2) variance component models for QTL mapping, 3) genotype and haplotype construction, 4) estimation of ethnic admixture, 5) isoform discovery through RNA-Seq technology, 6) computation of genetic landscapes and clines, 7) construction of gene networks from random multigraphs, and 8) design of new parallel algorithms for data mining. High-dimensional optimization is a common thread enabling all of these applications. Our previous research on optimization has demonstrated the efficacy of four fundamental ideas, namely, penalized estimation, coordinate descent, the MM (majorization-minimization) principle, and separation of parameters. These ideas also propel parallel computing. Implementation of our demonstration projects on GPUs will require the production of subroutines of considerable general value in computational statistics. We intend to release our toolbox libraries to the open source community, including C/C++, Fortran, and R software wrappers. This may lead to a multiplier effect that will improve the computing climate in many disciplines throughout the health and physical sciences. All other application programs produced under this proposal will be freely distributed to the scientific community. Our record of producing and distributing usable software with superior documentation shows our commitment to this philosophy. PUBLIC HEALTH RELEVANCE: The human genome project and its offshoots have dramatically increased the amount of genetic data. In fact, our ability to collect genetic information has currently far outstripped our ability to make use of this information in understanding the basis of disease and human diversity. Our aim is to develop, implement, and freely distribute new, more efficient computational and statistical approaches that make full use of the vast amount of genetic data, and thus improve genetic researchers'ability to map and characterize genes that lead to human diseases and to trait variation.

View original record on NIH RePORTER →