The Berkeley-TIGR Phylogenomic Encyclopedia of Microbial Protein Families
University Of California-Berkeley, Berkeley CA
Investigators
Abstract
Proteins are the workhorses of the cell; they form the structure of bones and hair, transmit signals between cells, unwind and repair DNA, and perform a host of other functions. They are the targets of pharmaceuticals; annual domestic investment in protein research is in the billions of dollars. Proteins are constructed by complex cellular machinery from their genomic templates, genes. When genomes are sequenced, one of the first questions asked (after the genes have been identified, but there is no clue about their actual function) is "What do these genes do?" Computer scientists have been developing methods to answer this question, and methods are getting more sophisticated daily. Among the most exciting methods developed in recent years is a new approach called phylogenomics that uses phylogenetic (evolutionary) analysis to predict the function of a protein in the context of the family to which it belongs. Just as a political scientist might study the changes in American society stemming from potentially conflicting agendas on both sides of the aisle, computational biologists study how protein families evolve novel functions and structures through disparate (and often competing) evolutionary processes, some of which act to conserve gene/protein function, while others act to enable new functions to evolve. Using these mathematical models of evolution enables biologists to pinpoint fairly precisely and accurately the most likely function of genes for which no experimental evidence may be available. The Berkeley Phylogenomics Group and The Institute for Genomic Research (TIGR) have teamed up to develop the first major biological database enabling phylogenomic analysis of microbial genomes, the PhyloFacts Microbial Encyclopedia. This unique web resource will contain pre-calculated evolutionary, structural and functional analyses for hundreds of thousands of proteins, and is designed to vastly improve the quality of functional annotation of microbial genomes. This powerful combination will provide an unparalleled resource to investigators in microbial biology, and provide a platform that will enable virtual collaborations among investigators worldwide. The intellectual and scientific merits of the project are on several levels. These include new understanding of the mechanisms underlying microbial genome evolution and the functional roles of previously unclassifiable genes; detailed analysis and quantification of annotation error rates; and the development of a resource for the microbial biology community. The broader impacts of this work will be enhanced through various mechanisms, including the development of a two-week summer short course in phylogenomic inference and microbial biology for high school and university students and investigators at other institutions; the inclusion of undergraduate and graduate students at UC Berkeley in microbial genome annotation using the resource; and the availability of a phylogenomic resource for the entire microbial biology community. Finally, all software developed for this project will be available in open source, and all results from this project will be available for download free to investigators in the public sector.
View original record on NSF Award Search →