GGrantIndex
← Search

Single-molecule sequence assembly and analysis

$1,580,736ZIAFY2021HGNIH

National Human Genome Research Institute

Investigators

Linked publications, trials & patents

Abstract

The primary focus of the section over the past year has been the completion of the human reference genome, which started with our founding of the Telomere-to-Telomere (T2T) consortium a few years ago. The goal of this project is to complete the last remaining regions of the human genome, which have gone unfinished for the 20 years following the completion of the Human Genome Project. With the recent introduction of new long-read sequencing methods, our work towards this goal has rapidly accelerated. Last year's report announced the completion of the first human chromosome (ChrX), and this year saw our publication of the fist complete human autosome (Chr8, ref 5). We have also continued to generate improved assemblies of human rDNAs (ref 3). However, most noteworthy was our release of the first truly complete assembly of a human genome in the spring of 2021, which is the result of a tremendous effort over the past year by both the GIS and the 100+ members of the T2T consortium. This is an important landmark and has been a primary aim of the section since its founding in 2015. Its successful completion relied on the HiCanu software, previously developed and reported by the GIS, as well as a close collaboration with the NIH Intramural Sequencing Center (NISC), which provided nanopore sequencing that was critical to the success of the project. The T2T consortium has now joined with the Human Pangenome Reference Consortium (HPRC) to build a number of additional complete human genome assemblies in the coming years from a broad set of samples that will fully capture the landscape of human genomic diversity. This year also marked the completion of the pilot phase of the Vertebrate Genomes Project (VGP) and publication of our flagship paper (ref 7) describing the assembly methods developed under the leadership of the GIS. The VGP has now assembled well over 100 high-quality vertebrate genomes, leading to new discoveries regarding vertebrate genome structure and evolution (ref 7). In addition to this flagship VGP paper, which describes the genomes of 16 vertebrate species, several dedicated studies were performed on the genomes of the vaquita porpoise (ref 6), whale shark (ref 8), marmoset (ref 9), and platypus (ref 10), which were of particular interest either due to their conservation status or relevance to human development and evolution. These genomes have been assembled using many of the techniques developed by the GIS and described in previous annual reports, including the Canu assembler and the trio-binning approach for accurate diploid genome assembly. This year, our section also helped develop new bioinformatic tools for the assembly of vertebrate mitogenomes (ref 1) and the phased assembly of diploid genomes using Hi-C chromatin interaction data (ref 4). The VGP now continues with phase 1 of the project, which aims to assemble one high-quality reference genome for each of the approximately 270 vertebrate taxonomic orders and perform whole-genome comparative analyses across the entire vertebrate evolutionary tree. Ultimately, we hope the methods developed for the VGP will enable the sequencing, assembly, and comparative genomics of all extant vertebrate species in order to better understand genome evolution and function. In addition to the 10 papers described above that were formally published this year, the section has posted 9 preprints to bioRxiv that are currently undergoing peer review. The majority of these preprints describe the completion of the human genome and associated analyses investigating the newly uncovered segmental duplications, satellite repeats, variants, and transposable elements throughout the entire genome along with their epigenetic profiles. We also contributed our expertise in helping to draft a 10 year strategic vision for the institute (ref 2).

View original record on NIH RePORTER →