CC* Compute: BioBurst

$494,066FY2017CSENSF

University Of California-San Diego, La Jolla CA

Investigators

Ronald B Hawkinscontact Robert Sinkovits Shawn M Strande Theresa Gaasterland

Abstract

The goal of the project is to deploy the BioBurst system to enhance the high performance computing capabilities at the University of California, San Diego, with technology designed to accelerate biological and life sciences research. The last few years have seen revolutionary advances in sequencing instruments for decoding genetic materials such as deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). DNA and RNA carry the genetic code and control the production of proteins essential to life. A byproduct of this revolution in DNA/RNA sequencing technology is the production of vast amounts of data that must be stored and analyzed in order to achieve scientific progress. The manner of conducting these analysis stresses existing research computing systems in ways that must be overcome in order to expand the scope of investigations and reduce the time to results. The BioBurst system aims to augment the campus research computing system with innovative technology to speed up both data access and computation on DNA/RNA sequence data. A better understanding of DNA and RNA has the potential for advancing our Nation's health and well-being, enabling applications such as new insights into the biological mechanisms causing disease, and the development of new biofuels and agriculture products. The technical goal of the project is to implement a separately scheduled partition of the existing campus research computing system with technology designed to address important classes of bioinformatics computing including genomics, transcriptomics, and immune receptor repertoire analysis. The BioBurst system will incorporate the following major components: (1) I/O acceleration appliance with 40 terabytes of non-volatile memory and software designed to alleviate the small-block/small-file I/O problem characteristic of many bioinformatics codes; (2) An FPGA-based computational accelerator node that has been demonstrated to perform demultiplexing, read mapping, and variant calling of complete human genomes in 22 minutes; (3) 672 commodity computing cores which will access the I/O accelerator and provide a separately scheduled resource for running bioinformatics applications; (4) integration with a large scale parallel file system, which supports streaming I/O and has the capacity to stage large amounts of data associated with many bioinformatics studies; and 5) customization to the job scheduler to accommodate bioinformatics workflows, which can consist of hundreds to thousands of jobs submitted by a single user at one time. These components will be integrated as a partition of the existing production research computing system, providing a unique and highly usable resource by researchers across campus. A key objective is to provide bulk computing capacity to conduct in the order of 8,000 whole-genome analyses per year plus the ability for quick turnaround (< 60 min.) single-genome analyses, and sufficient solid state disk (SSD) capacity for staging associated working sets (200GB - 1TB).

View original record on NSF Award Search →