Laying the Foundation of Genomic Enzymology

$344,094R01FY2015GMNIH

University Of California, San Francisco, San Francisco CA

Investigators

Linked publications & trials

Paper 34019384 Paper 33674467 Paper 32449511 Paper 32242662 Paper 31074977 Paper 30097089 Paper 29184004 Paper 29111295 Paper 29078300 Paper 28605775 Paper 28365730 Paper 28187133 Paper 28054422 Paper 27899635 Paper 27835946 Paper 27812939 Paper 27650333 Paper 27604469 Paper 26284514 Paper 26073648 Paper 25855808 Paper 25820941 Paper 25654171 Paper 25501940 Paper 25210038 Paper 24756107 Paper 24720347 Paper 24271399 Paper 23737737 Paper 23353650 Paper 23236535 Paper 22962345 Paper 22918439 Paper 22069326 Paper 22069325 Paper 22058127 Paper 21948213 Paper 21489855 Paper 21487022 Paper 21458983 Paper 21118823 Paper 20300652 Paper 20011109 Paper 19854678 Paper 19851441 Paper 19842715 Paper 19701464 Paper 19606141 Paper 19488406 Paper 19472362 Paper 19237310 Paper 19220063 Paper 19190775 Paper 18670595 Paper 18559271 Paper 18428763 Paper 17936488 Paper 17658942 Paper 17503785 Paper 17124868 Paper 16935022 Paper 16740275 Paper 16507141 Paper 16489747 Paper 15759641 Paper 15581566 Paper 15518547 Paper 15504039 Paper 15146493 Paper 14555634 Paper 12859183 Paper 12714057 Paper 12713273 Paper 12520064 Paper 11900527 Paper 11309374 Paper 11262944 Paper 11178260

Abstract

? DESCRIPTION (provided by applicant): Of the >50 million protein sequences now in public databases, only a tiny proportion have been experimentally characterized, necessitating assignment of molecular function almost exclusively by computational methods. Many enzymes can be classified as members of functionally diverse superfamilies (SFs); proteins descended from a common ancestor but diverged to catalyze many different chemical reactions using sometimes highly dissimilar substrates. Because these proteins all look alike with respect a subset of active site residues common to all members of each SF, prediction of their molecular functions is especially difficult and plagued by high levels of misannotation. Moreover, these SFs contain thousands of proteins, challenging our abilities to manage the data and information about them or even to determine for which proteins experimental characterization could be best leveraged for functional annotation or mechanistic insight about others of unknown function. The overall goals are to enhance understanding of SF structure-function relationships to improve computational annotation of many enzymes, to inform experimental design of mechanistic studies and enzyme engineering efforts important for human health, and to achieve a more informed theory about how nature re-uses ancestral structural templates to evolve many new enzymatic reactions. A major outcome will be expanded computational characterization of the universe of functionally diverse SFs. The aims are: 1. Create innovative approaches and tools to enhance the protein similarity network technology we pioneered to summarize sequence/structure/function relationships in enzyme SFs and enable their facile and visually interactive exploration at many levels of detail. Major challenges will be addressed for the use and interpretation of similarity networks to establish them as a major tool of genomic enzymology. To achieve this, we will create a new approach to ensure homogenous similarity signals across similarity networks and address complexities due to the complex domain architectures of some SFs that will help avoid their misinterpretation, devise a mechanism for mapping relevant functional features to networks to support efficient visual reasoning, and develop tools to address technical challenges for network generation. 2. Apply network technology to infer functional boundaries in enzyme SFs based on active site variation. Infer functional properties for sequences from metagenomic projects at a level of detail not yet achieved by current curation efforts by incorporating these sequences into our SF networks. 3. Collaborate with experts working on SFs that pose especially relevant challenges for development and application of network technology so that we can learn how best to optimize and deploy it for functional inference for unknowns, identification of new drug targets, and to help guide protein engineering. All results, including similarity networks, alignments, and other data will be disseminated by our Structure- Function Linkage Database, served via interactive and other analysis tools.

View original record on NIH RePORTER →