Comparative Analysis Of Completely Sequenced Genomes
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
The rapidly growing database of completely and nearly completely sequenced genomes of bacteria, archaea, eukaryotes and viruses (millions of genomes already available and many more in progress) creates both extensive new opportunities and major new challenges for genome research. During the year in review, we performed a variety of studies that took advantage of the genomic information to establish fundamental principles of genome evolution and to investigate the evolution of biologically and medically important groups of organisms. A major project involved a collaboration with Dr Meng Li's laboratory on comprehensive phylogenomic analysis of Asgard archaea. Asgard is a recently discovered superphylum of archaea that appears to include the closest archaeal relatives of eukaryotes. Debate continues as to whether the archaeal ancestor of eukaryotes belongs within the Asgard superphylum or whether this ancestor is a sister group to all other archaea (that is, a two-domain versus a three-domain tree of life). We performed comparative analysis of 162 complete or nearly complete genomes of Asgard archaea, including 75 metagenome-assembled genomes that-to our knowledge-have not previously been reported. Our results substantially expand the phylogenetic diversity of Asgard and lead us to propose six additional phyla that include a deep branch that we provisionally named Wukongarchaeota. Our phylogenomic analysis does not resolve unequivocally the evolutionary relationship between eukaryotes and Asgard archaea, but instead-depending on the choice of species and conserved genes used to build the phylogeny-supports either the origin of eukaryotes from within Asgard (as a sister group to the expanded Heimdallarchaeota-Wukongarchaeota branch) or a deeper branch for the eukaryote ancestor within archaea. Our comprehensive protein domain analysis using the 162 Asgard genomes results in a major expansion of the set of eukaryotic signature proteins. The Asgard eukaryotic signature proteins show variable phyletic distributions and domain architectures, which is suggestive of dynamic evolution through horizontal gene transfer, gene loss, gene duplication and domain shuffling. The phylogenomics of the Asgard archaea points to the accumulation of the components of the mobile archaeal 'eukaryome' in the archaeal ancestor of eukaryotes (within or outside Asgard) through extensive horizontal gene transfer. A major effort was dedicated to the study of the genomic diversity of viruses. Analysis of phage genomes identified in whole-community human gut metagenomes resulted in the delineation of at least three new candidate families of Caudovirales and revealed diverse putative mechanisms underlying phage-host interactions in the human gut. Addition of these phylogenetically classified, diverse, and distinct phages to public databases will facilitate taxonomic decomposition and functional characterization of human gut viromes. A complementary study explored the genomes of highly abundant but poorly characterized phages from the human intestine. CrAssphage is the most abundant human-associated virus and the founding member of a large group of bacteriophages, discovered in animal-associated and environmental metagenomes, that infect bacteria of the phylum Bacteroidetes. We analyzed 4907 Circular Metagenome Assembled Genomes (cMAGs) of putative viruses from human gut microbiomes and identify nearly 600 genomes of crAss-like phages that account for nearly 87% of the DNA reads mapped to these cMAGs. Phylogenetic analysis of conserved genes demonstrates the monophyly of crAss-like phages, a putative virus order, and of 5 branches, potential families within that order, two of which have not been identified previously. The phage genomes in one of these families are almost twofold larger than the crAssphage genome (145-192 kilobases), with high density of self-splicing introns and inteins. Many crAss-like phages encode suppressor tRNAs that enable read-through of UGA or UAG stop-codons, mostly, in late phage genes. A distinct feature of the crAss-like phages is the recurrent switch of the phage DNA polymerase type between A and B families. Thus, comparative genomic analysis of the expanded assemblage of crAss-like phages reveals aspects of genome architecture and expression as well as phage biology that were not apparent from the previous work on phage genomics. The COVID-19 pandemic compelled us to turn to launch a comprehensive evolutionary analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Understanding the trends in SARS-CoV-2 evolution is paramount to control the COVID-19 pandemic. We analyzed more than 300,000 high-quality genome sequences of SARS-CoV-2 variants available as of January 2021. The results show that the ongoing evolution of SARS-CoV-2 during the pandemic is characterized primarily by purifying selection, but a small set of sites appear to evolve under positive selection. The receptor-binding domain of the spike protein and the region of the nucleocapsid protein associated with nuclear localization signals (NLS) are enriched with positively selected amino acid replacements. These replacements form a strongly connected network of apparent epistatic interactions and are signatures of major partitions in the SARS-CoV-2 phylogeny. Virus diversity within each geographic region has been steadily growing for the entirety of the pandemic, but analysis of the phylogenetic distances between pairs of regions reveals four distinct periods based on global partitioning of the tree and the emergence of key mutations. The initial period of rapid diversification into region-specific phylogenies that ended in February 2020 was followed by a major extinction event and global homogenization concomitant with the spread of D614G in the spike protein, ending in March 2020. The NLS-associated variants across multiple partitions rose to global prominence in March to July, during a period of stasis in terms of interregional diversity. Finally, beginning in July 2020, multiple mutations, some of which have since been demonstrated to enable antibody evasion, began to emerge associated with ongoing regional diversification, which might be indicative of speciation. These and other phylogenomic studies performed over last year enhance our understanding of genome evolution in general, and of the genomes of viruses relevant or human health, in particular.
View original record on NIH RePORTER →