CAREER: A comprehensive computational platform for detecting yet unseen microbial pathogens
William Marsh Rice University, Houston TX
Investigators
Abstract
We are in the golden age of our ability to read and write DNA. The sequencing of genomic data found in nature is now democratized, opening the door to a digital library of countless documents of evolutionary history. In parallel, the synthesis of engineered DNA for widespread societal benefits is now automated and affordable, showing incredible promise in recent years. Indeed, recent advances in reading and writing DNA have the potential to resolve major global challenges, such as boosting crop yield to address food shortages, mitigating pollution through carbon capture, and improving pandemic response and preparedness. While these remarkable technological advances can be used for broad societal benefit, they are underutilized for tracking yet-unseen pathogens that can result in widespread economic and public harm. Our ability to read and write DNA at scale, especially with respect to uncovering yet-unseen pathogens and intentionally or unintentionally enhancing existing pathogens, has far outstripped computational tools capable of tracking and preventing misuse. To address this critical gap, the research detailed in this proposal will focus on developing computational tools to aid in detecting yet-unseen pathogens and preventing intentional or unintentional misuse of synthetic DNA. This project will advocate for a novel paradigm of pathogen detection and monitoring through the pursuit of innovative computational methods and approaches. The research methodology will be motivated by tried and tested approaches in biosurveillance while pursuing innovative computational strategies. Specifically, this project will address four fundamental computational research challenges: (1) yet-unseen pathogen characterization -- contextualizing taxonomy-based approaches with functions of concern to learn how to identify novel pathogens, (2) petabyte-scale cataloging of microbial dark matter -- combining probabilistic algorithm development with comparative genomic approaches for the query of known and rare microbial genes, (3) genetic engineering detection -- discerning engineered DNA from naturally occurring DNA through the development of graph-based pan genomes combined with codon usage bias models, and (4) implementation of the modular computational platform GuarDNA -- integrating everything together into the first-ever comprehensive platform specifically designed for both biosecurity and biosurveillance. GuarDNA will be designed following software engineering best practices, with code modularity as a key focus to facilitate community engagement. These four research challenges will be accompanied by a comprehensive test and evaluation plan, which both provides an individual assessment of each of the four research thrusts as well as continuous integration testing to provide an overarching evaluation of the GuarDNA platform. This research effort will open the door to novel computational approaches for biosecurity and biosurveillance. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →