Collaborative Research: BeeHive: A Cross-Problem Benchmarking Framework for Network Biology
Reed College, Portland OR
Investigators
Abstract
Many important aspects of biology involve relationships between the molecules within cells. For example, a medicine may turn off a diseased protein, or protein may activate an important gene. These individual relationships organize into larger biological networks. Many computational methods aim to predict these types of network relationships and which relationships control essential biological processes. This project will establish a computational framework called BeeHive to support running and comparing modern computational tools for studying biological networks. BeeHive will make it considerably easier to analyze biological data with these methods and evaluate their strengths and weaknesses. The framework will automatically update a website that tests top methods on a variety of biological use cases, which will provide important benchmarking and assessments for the network biology scientific community. The project will showcase BeeHive with biological applications in gene regulation, protein signaling, and chemical target networks. BeeHive will be used in undergraduate research experiences through a Summer Research Institute across the three project sites. The project will develop BeeHive, a general platform for multiple types of network biology workflows. BeeHive will provide a shared framework and modular components that implement common elements of network biology analyses including installation of algorithms, data pre-processing, cross-validation methods, and network visualization. The BeeHive infrastructure will enable running many network algorithms at scale from a single interface. This strategy will support rigorous benchmarking of network algorithms and greatly simplify testing multiple algorithms on a new biological dataset. The project will apply BeeHive to three representative applications, namely gene regulatory network inference, pathway reconstruction, and small molecule-protein target prediction applications. These problems are important in the genomics and bioinformatics research communities due to recent computational and biotechnological advances, such as graph neural networks and single-cell RNA-sequencing. Key components of BeeHive will include a modular and general purpose Python package that can be reused, a template Snakemake workflow to execute the shared steps of network biology analysis from data pre-processing through network visualization, a framework for continuous benchmarking that uses concepts from continuous integration in software engineering, Docker containers for tens of existing network biology algorithms, and datasets spanning yeast, mouse, human, and plants. Core objectives of BeeHive include advancing computational infrastructure for network biology analysis and benchmarking as well as creating an active and growing scientific community to create rigorous and standardized benchmarking frameworks and contribute methods and datasets to BeeHive. In the long term, the project will broadly generalize to other aspects of network biology and can catalyze analogous efforts in other domains in systems and computational biology. This project will train graduate students and create a Summer Research Institute that hosts six undergraduate researchers per year across the three project sites. Recruitment for the Summer Research Institute will emphasize broadening participation of students from historically marginalized groups. Results from this project will be available at https://bioinformatics.cs.vt.edu/~murali/beehive. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →