Computational Annotation of Orphan Metabolic Activities
Columbia University Health Sciences, New York NY
Investigators
Linked publications & trials
Abstract
[unreadable] DESCRIPTION (provided by applicant): Even state-of-the-art homology methods cannot annotate metabolic genes with no or remote sequence identity to known enzymes. This presents a significant obstacle to network reconstruction, as about 30%- 40% (>1500) of known metabolic activities remain orphan, i.e. there are no known proteins catalyzing these activities in any organism. The scale of the orphan activities problem makes it arguably the single biggest challenge of modern biochemistry. We propose to develop, experimentally validate, and make available to the scientific community an efficient computational approach to fill the remaining gaps in metabolic networks. The main idea of the proposed method is to use genes assigned to the network neighbors of the remaining gaps as constraints in assigning genes for orphan activities. We demonstrate that this approach significantly outperforms simpler or existing methods. Our cross-validated results in model organisms demonstrate that the proposed method can predict the correct genes in more than 50% of the cases, without any sequence homology information. The calculations indicate that the prediction accuracy will also remain high in less studied organisms. Using the developed method we have already identified and validated a gene responsible for an E. coli metabolic activity which remained orphan for more than 25 years. There are four specific aims of the proposal: 1.) We will calculate the appropriate context-based descriptors of protein function for the majority of sequenced organisms. Many new functional descriptors will be developed and used for the predictions. 2.) We will investigate the ability of various machine learning approaches and fitness functions to integrate context-based descriptors. Based on the developed methodology we will make predictions for all orphan activities in sequenced organisms. 3.) The predictions will be available through a searchable and constantly updated Web server. We will also develop a method to detect functional misannotations and apply it to all public metabolic databases. 4.) In collaboration with the laboratories of Dr. Uwe Sauer (ETH Zurich) and Dr. George Church (Harvard) we will experimentally test at least 50 of the predicted genes without close sequence homologs in E. coli, B. subtilis, S. cerevisiae. [unreadable] [unreadable] [unreadable]
View original record on NIH RePORTER →