TOOLS-PGR: Computational Infrastructure to Enable High-throughput, High-quality Annotations of Compartmentalized Metabolic Networks for Plant Genomes

$2,193,335FY2016BIONSF

Carnegie Institution Of Washington, Washington DC

Investigators

Abstract

It has been estimated that agricultural productivity needs to be increased to meet the demands imposed by population growth and climate change. Changing the metabolism of crop species is one way to improve productivity. Thus, increasing our knowledge of plant metabolism can significantly accelerate crop improvement efforts. New DNA sequencing technologies have produced an enormous amount of data. However, it has been difficult to obtain useful metabolic information from those DNA sequences. The plant research community needs efficient tools that can extract information related to metabolism from those DNA sequences. This project will produce the tools and datasets that will be used to systematically characterize the components of metabolism: enzymes, transporters, and pathways. These tools will make it easy to compare the metabolic genetic potential of two or more species, and enable the identification of targets for crop improvement. This project will also offer training opportunities in biochemistry and computer sciences to postdoctoral associates and students. In addition, workshops will be offered at professional meetings to train members of the plant research community on the use of the tools developed by the project. Finally, the tools developed by this project will be made available to the scientific community through a web portal. Accurate and rapid annotation of metabolic enzymes and transporters from sequenced genomes and their metabolic network reconstructions are essential resources for interpreting the results of 'omics' data systematically and enabling the generation of new hypotheses. This proposal aims to meet these needs by developing a computational pipeline to enable rapid and accurate prediction of genome-scale metabolic complements of any sequenced plant based on the large pool of experimentally characterized information. First, the team will improve the accuracy of enzyme function prediction by adding new classifiers and features to a redesigned machine-learning framework. Additions of new classifiers such as phylogenomics-based function prediction and new features such as conserved protein domain architecture and conserved residues would reduce false positive predictions of proteins that share high sequence similarity with known enzymes but catalyze distinct functions. The team will also develop a new learning based algorithm to predict subcellular locations of enzymes and reactions for any plant species. The algorithm will combine the localization likelihoods of enzymes derived from the experimentally determined localization information of their orthologs and the localization information of the neighboring reactions in the metabolic network to propagate the localization likelihoods among all the reactions in the network. Another new algorithm will be developed to predict transporters and the substrates of transporters. All data generated from this project will be integrated into the PMN databases. In addition, a pipeline will be packaged to enable users to submit their genome sequences online and obtain the prediction results through a web server. Finally, innovative, integrated views of metabolic pathways with gene co-expression, transporters and subcellular compartments will be developed.

View original record on NSF Award Search →