Extending Bayesian Phylogenetic Analysis

$245,837R01FY2006GMNIH

University Of California Berkeley, Berkeley CA

Investigators

Linked publications & trials

Abstract

DESCRIPTION (provided by applicant): Phylogenies play a central role in biology. They have had a revolutionary effect in evolutionary biology where they are used to address questions such as the history of cospeciation between hosts and parasites, the timing of the major events of the diversification of life, and the evolution of ecologically important characters. Phylogenies also serve as an important tool in medicine. For example, phylogenies are widely used in epidemiology, where among other applications they have been used to establish the origin and timing of the HIV epidemic in humans. However, modem genomics, with its enormous quantity and high quality of genetic data, poses significant challenges to the field of phylogenetics: how can one make sense of genomic data in a phylogenetic context and how can these data be used to address interesting and important questions, such as the functional importance of amino acid positions? Bayesian estimation of phylogeny represents one of the most promising recent developments in the field. In this proposal, theory and methods will be developed that will extend Bayesian analysis of genetic data in several important ways. First, methods capable of identifying site and branch combinations under positive selection will be developed. These methods will expand the traditional codon models of DNA substitution to allow switching between selection classes. Second, improved methods of Bayesian inference will be developed for estimating large phylogenetic trees. The improvements will involve variants of Markov chain Monte Carlo that better explore the space of trees. Third, Bayesian methods for estimating divergence times will be developed. These methods will accommodate uncertainty in the phylogenetic tree, model parameters, and calibration times when estimating divergence times of pre-specified groups of species (or viral sequences). Finally, new methods for predicting RNA secondary structure will be developed. The method will combine information from comparative sequence analysis and the Gibb's free energy.

View original record on NIH RePORTER →