Comparative Gene-Structure Prediction in Invertebrates

$279,256FY2002BIONSF

Washington University, Saint Louis MO

Investigators

Abstract

DBI-0132436 Brent, Michael Washington University, St. Louis ABSTRACT A steadily increasing proportion of biological research is conducted on organisms whose genomes have been sequenced. For many research questions, however, an organism's genome is important primarily because of the proteins it encodes. So a critical question in genome analysis is: What are the structures of all the protein-coding genes and the exact sequences of the proteins they encode? The proposed research aims to improve gene-structure prediction in model invertebrates by integrating probabilistic models of gene structure with information from genome comparisons. Project 1: Probability models for gene-structure prediction using genomic homology This project focuses on developing probability models for exploiting genome comparison to improve gene-structure prediction. A novel aspect of the proposed models is their use of conservation sequence to represent the degree and pattern of evolutionary conservation at each point in the genome to be annotated. A conservation sequence is a synthesis of genome alignments. The probability models build on the Hidden Markov Model approach used in state-of-the-art gene-structure prediction systems.Project 2: Enhanced probability models for single-sequence gene-structure predictionTWINSCAN is the gene-structure prediction system developed with prior NSF support. TWINSCAN integrates genome comparison with probabilistic gene-structure models. This project focuses on developing improvements to the single-sequence portion of the gene-structure model and to specialize it for model invertebrates. Project 3: Parameter estimation module and comparative annotation of invertebrates. This project focuses on (a) development of a complete parameter estimation module that will allow TWINSCAN to be adapted to new genomes easily, and (b) genome-wide gene-structure prediction in invertebrate model organisms. The focus will be on a pair of roundworm genomes and a pair of fly genomes. In each case, genome-wide prediction will be done using patterns of similarity to the related genome as one information source. Our gene-structure predictions will be provided to the research community through a web site (genes.cs.wustl.edu) and a collaboration with the Ensembl group at the European Bioinformatics Institute. The ability to systemically, reliably, and affordably predict gene structure would constitute significant progress in high-throughput biology and biotechnology. Further, an algorithm for computing the complete protein sequence of the products of a genome would have important applications. Among these are: 1. Identifying targets for functional and biochemical study, such as novel protein families. 2. Providing accurate protein sequence to secondary- and tertiary-structure prediction programs. 3. Finding genes that give rise to specific phenotypes. This research would involve undergraduate and graduate students at Washington University in St. Louis, thus contributing to research-based education and national biotechnology infrastructure.

View original record on NSF Award Search →