Supplement: Enhancing Community Contributions to Bioconductor With Build System Containerization and a GPU for Testing
Dana-Farber Cancer Inst, Boston MA
Investigators
Linked publications & trials
Abstract
PROJECT SUMMARY/ABSTRACT "Bioconductor: An Open-Source, Open-Development Computing Resource for Genomics" is the parent grant of this application for supplemental funds. Bioconductor is an open source, open development ecosystem of software and data for the analysis and comprehension of genome- scale experiments. The system is primarily rooted in the R language but has extensive interoperability capabilities and incorporates tools developed in numerous other languages. Bioconductor resources are requested at scientific and industrial sites throughout the world, with an outflow from Bioonductor's cloud distribution systems of approximately 1TB of software and data per day. The system provides infrastructure for genome representation and representation of variants, genomic sequences and gene models for numerous model organisms, analytical software for many array and sequencing platforms, and annotation and experiment archives, all checked for consistency and portability on a daily basis. This request for supplemental funds addresses engineering requirements of the build and distribution system, which has had its present structure for over ten years, and adds support for graphical processing units in the build and test system. Our plan includes work on containerization and upscaling of the build and check system, and enhancement of the methods of recording resource use and adverse events in ecosystem build processes. In essence, the proposed work will allow the Bioconductor project to more fully embrace an "Infrastructure as Code" discipline to simplify redeployment and rescaling in response to throughput requirements, and to improve automatic discovery and reporting of system problems. Because the requirements of the genomic data science community are continually diversifying in terms of complexity and volume of data being generated, and the number of hardware platforms in use is likewise growing, the work proposed cannot be accomplished under the initial funding plan.
View original record on NIH RePORTER →