Computational Annotation of Orphan Metabolic Activities

$364,130R01FY2007GMNIH

Columbia University Health Sciences, New York NY

Investigators

Linked publications & trials

Paper 32284567 Paper 25363780 Paper 24288370 Paper 23604282 Paper 23143521 Paper 22960854 Paper 21658583 Paper 20920270 Paper 20833318 Paper 20404177 Paper 19935659 Paper 18326631 Paper 17873044

Abstract

[unreadable] DESCRIPTION (provided by applicant): Even state-of-the-art homology methods cannot annotate metabolic genes with no or remote sequence identity to known enzymes. This presents a significant obstacle to network reconstruction, as about 30%- 40% (>1500) of known metabolic activities remain orphan, i.e. there are no known proteins catalyzing these activities in any organism. The scale of the orphan activities problem makes it arguably the single biggest challenge of modern biochemistry. We propose to develop, experimentally validate, and make available to the scientific community an efficient computational approach to fill the remaining gaps in metabolic networks. The main idea of the proposed method is to use genes assigned to the network neighbors of the remaining gaps as constraints in assigning genes for orphan activities. We demonstrate that this approach significantly outperforms simpler or existing methods. Our cross-validated results in model organisms demonstrate that the proposed method can predict the correct genes in more than 50% of the cases, without any sequence homology information. The calculations indicate that the prediction accuracy will also remain high in less studied organisms. Using the developed method we have already identified and validated a gene responsible for an E. coli metabolic activity which remained orphan for more than 25 years. There are four specific aims of the proposal: 1.) We will calculate the appropriate context-based descriptors of protein function for the majority of sequenced organisms. Many new functional descriptors will be developed and used for the predictions. 2.) We will investigate the ability of various machine learning approaches and fitness functions to integrate context-based descriptors. Based on the developed methodology we will make predictions for all orphan activities in sequenced organisms. 3.) The predictions will be available through a searchable and constantly updated Web server. We will also develop a method to detect functional misannotations and apply it to all public metabolic databases. 4.) In collaboration with the laboratories of Dr. Uwe Sauer (ETH Zurich) and Dr. George Church (Harvard) we will experimentally test at least 50 of the predicted genes without close sequence homologs in E. coli, B. subtilis, S. cerevisiae. [unreadable] [unreadable] [unreadable]

View original record on NIH RePORTER →