GGrantIndex
← Search

Single-molecule sequence assembly and analysis

$1,901,956ZIAFY2025HGNIH

National Human Genome Research Institute

Investigators

Linked publications & trials

Abstract

This year we released a new major version of our Verkko assembly software, which combines multiple long-read technologies, along with Hi-C proximity ligation data, to phase and assemble entire diploid chromosomes from telomere to telomere (ref 1). Verkko automates the previous semi-manual process for genome finishing that was used to complete the first human genome and is enabling the assembly of hundreds of additional human haplotypes. In collaboration with the Human Genome Structural Variation Consortium, we used Verkko to assemble 130 near-complete human haplotypes across a broad set of human samples, revealing complex variation missed by prior approaches (ref 2). Building on this success, we expect to deploy Verkko in the coming year to construct upwards of 1,000 complete human haplotypes for the Human Pangenome Reference Consortium. We are also continuing to evaluate different sequencing strategies for the efficient reconstruction of complete human genomes, and recently demonstrated that it is possible to assemble complete human genomes using nanopore sequencing alone (ref 3). Following our completion of the ape sex chromosomes last year, we finished the complete genomes for all ape species in 2025 (ref 4). This represents a tremendously valuable resource for human comparative genomics, since it is now possible to pinpoint exactly which regions and structures of the genome are specific to humans. Many such regions are highly repetitive sequences and amplified gene families that are highly diverged from the other apes. Thus, these complete ape genomes present an opportunity for discovery, specifically in regions of the genome most different between humans and the other apes. Finally, the GIS remains an active member of the Vertebrate Genomes Project (VGP) and Earth Biogenome Project (EBP), which together aim to sequence the genomes of all eukaryotic life on earth. These projects are producing extremely valuable genomic datasets that will guide future conservation efforts and enable large-scale comparative genomics. The VGP recently completed its phase one goal of completing at least one genome from (almost) all of the approximately 270 vertebrate taxonomic orders, which was made possible, in part, by the genome assembly, validation, and alignment software developed by the GIS. This year, with VGP collaborators, the GIS published individual reference genomes for the crab-eating macaque (ref 5) and echidna (ref 6), and we are working towards a second flagship VGP publication in 2026 that will include hundreds of additional species. In addition to the 6 papers above that were formally published this year, the section posted 4 preprints to bioRxiv in the past year that are currently undergoing peer review, including: the rhesus macaque genome, a long-read survey of structural and epigenetic variation in hundreds of human brains, a mechanistic explanation for the formation of Robertsonian chromosomes, and an investigation into the epigenetic control and inheritance of human rDNAs.

View original record on NIH RePORTER →