GGrantIndex
← Search

Identifying Selection Pressures on Viral Genomes

$565,000FY2008MPSNSF

University Of California-San Diego, La Jolla CA

Investigators

Abstract

This study will develop sophisticated likelihood-based statistical models of molecular evolution to infer and investigate evolutionary patterns from alignments of coding sequences. In their most general form, the models allow the pattern of substitution to vary across the phylogenetic tree, across sites in the alignment, and to depend on the residues involved. The models also accommodate the effect of recombination by permitting phylogenetic histories to change between regions of an alignment. These evolutionary patterns will be derived from the data, rather than from a priori assumptions, through the innovative use of genetic algorithms to search for good-fitting models. The investigators will assess convergence and coverage properties of different algorithms. As there may be many models with similar fits to the data, inference based upon the best fitting model will be compared with those obtained by a weighted average over a sample of models. The unparalleled level of biological realism and statistical sophistication accommodated by our models will permit the research team to perform rapid and accurate high-resolution comparative studies of large collections of genes. New techniques will help bridge the emerging gap between meta-genomic analyses made possible by modern sequencing techniques and existing evolutionary analysis techniques developed in the single-gene era. These models will be developed and validated using a variety of viral datasets, focusing on the three pathogens - human immunodeficienty virus type 1, influenza A virus and hepatitis C virus - for which there is a wealth of sequence data. Over time, evolutionary processes, such as natural selection, mold every gene into a unique mosaic of sites evolving rapidly and resisting change - an "evolutionary fingerprint" of the gene. Much like ordinary fingerprints, they contain a wealth of information, identifying similarities and idiosyncrasies of individual genes. However, current techniques can only identify coarse evolutionary patterns, and are not well suited to the analysis of extermely large genomic datasets which are being generated at an ever-increasing rate. This project will bring unparalleled resolution to gene fingerprinting using sophisticated mathematical and statistical models and high-performance computing. The project will generate new state-of-the-art software tools for the use of the scientific community. The investigators will apply new techniques to viral pathogens that have had a serious impact on global health, including HIV, Influenza A virus and Hepatitis C virus, leading to a significant improvement in our understanding of viral evolution.

View original record on NSF Award Search →