Finding Protein Sequence Motifs--methods And Applications

$399,581ZIAFY2021LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Paper 39023254 Paper 38832788 Paper 38805295 Paper 38739430 Paper 38698035 Paper 38657076 Paper 38380930 Paper 38182597 Paper 37138077 Paper 37017580 Paper 36696902 Paper 35951700 Paper 35896753 Paper 35848484 Paper 35760839 Paper 35746792 Paper 35638784 Paper 35466371 Paper 35402511 Paper 35289643 Paper 34413232 Paper 34253028 Paper 34028251 Paper 33911286 Paper 32728052 Paper 32032510 Paper 31857715 Paper 31740763 Paper 31165781 Paper 31089700 Paper 31064832 Paper 30993331 Paper 30773816 Paper 30733291 Paper 30710061 Paper 29925949 Paper 29784811 Paper 29636073 Paper 29507349 Paper 29360740 Paper 29263101 Paper 29179671 Paper 29175107 Paper 29133882 Paper 28937734 Paper 28694999 Paper 28657885 Paper 28605718 Paper 28545555 Paper 28356531 Paper 28265094 Paper 28187792 Paper 28111461 Paper 28065598 Paper 27493190 Paper 27466388 Paper 27256883 Paper 27236306 Paper 27199977 Paper 27114038 Paper 26836982 Paper 26712934 Paper 26593719 Paper 26560305 Paper 26514828 Paper 26432522 Paper 26422227 Paper 26411297 Paper 26136578 Paper 26103305 Paper 26095544 Paper 26077867 Paper 26071768 Paper 26071590 Paper 25981466 Paper 25928409 Paper 25927823 Paper 25909276 Paper 25902496 Paper 25884386 Paper 25840414 Paper 25764277 Paper 25727355 Paper 25583072 Paper 25534808 Paper 25488578 Paper 25428365 Paper 25374149 Paper 25192263 Paper 25113822 Paper 25101062 Paper 25036622 Paper 24939392 Paper 24884953 Paper 24817877 Paper 24792168 Paper 24773695 Paper 24728998 Paper 24351931 Paper 24256226

Abstract

The rapid accumulation of genome sequences and protein structures during the last decade has been paralleled by major advances in sequence database search methods. The powerful Position-Specific Iterating BLAST (PSI-BLAST) method developed at the NCBI forms the basis of our work on protein motif analysis. In addition, Hidden Markov Models (HMM), protein profile-against-profile comparison implemented in the HHSearch method, protein structure comparison methods, homology modeling of protein structure and genome context analysis were extensively and increasingly applied. Furthermore, custom libraries of protein domain profiles as well as computational pipelines for novel domain identification have been developed and applied. Lately, these methods for protein motif search are being complemented by deep learning computational methods. During the year under review, we have continued and expanded our investigation of the proteins domains, particularly, those that are encoded in the genomes of viruses of prokaryotes and eukaryotes as well as Asgard archaea that are the closest archaeal relatives of eukaryotes. The enormous diversity of viruses is far from being completely understood, and numerous protein domains, particularly those involved in virus-host interactions, remain to be studied. During the least year, we have thoroughly explored the proteins encoded in the genomes of bacteriophages assembled from metagenomic sequences, including crAss-like phages, the most abundant human associated viruses, and identified a variety of domain not previously detected in viruses. In addition, we performed a comprehensive analysis of the proteins encoded in the genomes of orthopoxviruses, a family of large animal viruses including smallpox virus, and identified the domain composition of several uncharacterized virus proteins leading to testable functional predictions. In collaboration with the laboratory of Dr. Feng Zhang, of the Broad Institute of MIT and Harvard, we studied human proteins containing various derivatives of the capsid proteins of retroviruses and retrotransposons. Eukaryotic genomes contain numerous domesticated genes from integrating viruses and mobile genetic elements. Among these are homologs of the capsid protein (known as Gag) of long terminal repeat (LTR) retrotransposons and retroviruses. We identified several mammalian Gag homologs that form virus-like particles and one LTR retrotransposon homolog, PEG10, that preferentially binds and facilitates vesicular secretion of its own messenger RNA (mRNA). It was shown that the mRNA cargo of PEG10 can be reprogrammed by flanking genes of interest with Peg10's untranslated regions. Responding to the challenges posed by the COVID-19 pandemic, we studied the functional interaction between the domains of the RNA-dependent RNA polymerase of SARS-CoV-2. The catalytic subunit of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA-dependent RNA polymerase (RdRp) Nsp12 has a unique nidovirus RdRp-associated nucleotidyltransferase (NiRAN) domain that transfers nucleoside monophosphates to the Nsp9 protein and the nascent RNA. The NiRAN and RdRp modules form a dynamic interface distant from their catalytic sites, and both activities are essential for viral replication. We report that codon-optimized (for the pause-free translation in bacterial cells) Nsp12 exists in an inactive state in which NiRAN-RdRp interactions are broken, whereas translation by slow ribosomes and incubation with accessory Nsp7/8 subunits or nucleoside triphosphates (NTPs) partially rescue RdRp activity. This work shows that adenosine and remdesivir triphosphates promote the synthesis of A-less RNAs, as does ppGpp, while amino acid substitutions at the NiRAN-RdRp interface augment activation, suggesting that ligand binding to the NiRAN catalytic site modulates RdRp activity. The existence of allosterically linked nucleotidyl transferase sites that utilize the same substrates has important implications for understanding the mechanism of SARS-CoV-2 replication and the design of its inhibitors. During the last year, we also performed a comprehensive analysis of the genomes and proteins of Asgard archaea, the closest archaeal relatives of eukaryotes, the diversity of which was greatly expanded in our collaboration with the laboratory of Dr. Meng Li, of Shenzhen University. Our protein domain analysis using the 162 Asgard genomes results in a major expansion of the set of eukaryotic signature proteins. The Asgard eukaryotic signature proteins show variable phyletic distributions and domain architectures, which is suggestive of dynamic evolution through horizontal gene transfer, gene loss, gene duplication and domain shuffling. The phylogenomics of the Asgard archaea points to the accumulation of the components of the mobile archaeal 'eukaryome' in the archaeal ancestor of eukaryotes (within or outside Asgard) through extensive horizontal gene transfer. In summary, over the year in review, our research on protein domains led to a substantial increase in the repertoire of domains encoded by viruses of prokaryotes and eukaryotes, and to insights into fundamental problems of evolutionary biology including the origin of eukaryotes. We also performed a study that may help the design of inhibitors of SARS-CoV-2 RNA polymerase.

View original record on NIH RePORTER →