Population-scale, long-read characterization of human genome and microbiome

$191,819ZIAFY2023CANIH

Division Of Basic Sciences - Nci

Investigators

Linked publications & trials

Abstract

In Aim 3.1, as a part of the NIH CARD collaboration, we are aiming to sequence 4000+ human brain genomes to study structural variation, which will be the largest long-read based SV database to date. We have optimized nanopore sequencing protocol capable of producing 30x whole-genome sequencing from a single experiment. At the same time, we are developing new assembly-based methods for detecting the variation at all scales that runs within a day for each sample. In Aim 3.2, we are developing long-read assembly methods for improving the resolution of the human microbiome. Microorganisms may play an important role in cancer development and treatment response; and their role may be understudied in part because of our incomplete understanding of the human microbiome diversity and function. Here we are developing a tool for strain-level metagenomic deconvolution using long reads, which addresses the limitation of the current methods that fail to distinguish genomes of closely-related strains or species. Finally, in Aim 3.3 we develop new hybrid long-read and short-read approaches, as the vast majority of the current whole-genome sequencing databases contain Illumina data. We will first explore the pangenome approach to improve somatic SV calling using short reads. Then, we aim to develop a hybrid long- and short-read approach for low biomass tumors, when generating deep long-read whole-genome sequencing is impractical. cilitate discovery of known and new biosynthetic gene clusters, that encode important natural products.

View original record on NIH RePORTER →