COMPARATIVE ANALYSIS OF COMPLETELY SEQUENCED GENOMES
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
The rapidly growing database of completely sequenced genomes of bacteria, archaea and eukaryotes (approximately 35 genomes available by the end of 2000 and many more in progress) creates both new opportunities and new challenges for genome research. Inorder to take advantage of this information, we developed asystem of Clusters of Orthologous Groups of proteins (COGs) from30 completely sequenced genomes. This database is being continuously updated to incorporate newly appearing genomes. TheCOG allows nearly automatic functional annotation of 60-80% ofthe proteins encoded in each of the tested bacterial and archaeal genomes, although only about 30% of the eukaryotic proteins fitinto these groups. In addition to functional prediction, this approach provides for the systematic delineation of the set ofancient, conserved protein families that are missing in any particular genome. Examination of evolutionary patterns (i.e. representation of different species iand phylogenetic lineages)in the families of orthologs suggests a major role of horizontal gene transfer and lineage-specific gene loss in the evolution of prokaryotes. More specifically, we found evidence of massive horizontal gene among the archaea, between archaea and thermophilic bacteria and between bacterial parasites and their eukaryotic hosts. Additionally, we investigated in detail the lineage-specific gene expansions in prokaryotes and their possible adaptive significance and performed a theoretical of the distribution of evolutionary rates among orthologs from complete genomes.
View original record on NIH RePORTER →