CC* Integration-Small: Harnessing FABRIC for Scalable Human Genome Sequence Analysis
University Of Missouri-Columbia, Columbia MO
Investigators
Abstract
Human genomes contain valuable information that can enable new ways of treating life-threatening diseases and improving health outcomes. However, human genomes are very large in size and require substantial computing, networking, and storage resources for processing and analysis. The goal of this project is to leverage FABRIC (https://fabric-testbed.net, an NSF-funded national research infrastructure) for efficient and secure processing and analysis of human genome sequences. Students will be involved in the research and outreach activities of the project. Software will be developed for broader use by the education and research community. This project will lead to a suite of new algorithms and techniques for efficient and secure processing and analysis of human genomes at scale using cluster computing technologies and cutting-edge, programmable hardware available on FABRIC. The research thrusts include (a) acceleration of variant calling pipelines using hardware accelerators and Big Data tools, (b) secure and efficient processing of genomes using advanced networking capabilities, and (c) a workload distribution scheme for combining the resources of FABRIC with CloudLab (https://cloudlab.us/, another NSF-funded testbed) for genome processing. Measurement data on FABRIC and CloudLab will be collected for further analysis. This project will advance the state of the art in scalable human genome sequence analysis, which is essential for the prevention and treatment of complex diseases like cancer as well as drug discovery. It will lead to an open source software for large-scale, cost-effective human genome sequence analysis using cutting-edge, everywhere programmable testbeds. The findings will be disseminated in the form of publications, software, datasets, and training materials. New coursework will be developed for computer science and informatics students. A summer camp will be conducted for high school students. The project website is hosted at https://github.com/MU-Data-Science/GAF. This repository will be maintained for 5 years after the completion of the project. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →