AF: Medium: Statistical Inference of Complex Evolutionary Histories
William Marsh Rice University, Houston TX
Investigators
Abstract
Genes are an essential building block of all forms of life. Understanding how genes evolve and diversify their function would contribute significantly to elucidating many phenomena and processes in biology, including how diseases emerge and how to treat them. Genes undergo evolutionary events that range from small-scale ones (e.g., one nucleotide is replaced by another) to large-scale ones (e.g., a gene gets duplicated resulting in more than one copy of the same gene). Accurately identifying these evolutionary events for different gene families is the focus of this project. In particular, the project is aimed at devising mathematical models, computational techniques, and software products for mapping the trajectory of a gene through time in light of a variety of evolutionary processes. The project will have impact on biology and biomedicine, will result in publicly available software products that enable new analyses, and will train students at the intersection of computer science, statistics, and biology. Inferring accurate evolutionary histories, or phylogenies, of species is a major endeavor in evolutionary biology, and has implications on all aspects of biology. This inference used to be conducted by sequencing a certain region of interest from the genomes of species under investigation, building a genealogy, or gene tree, for the region, and declaring the tree to be the species phylogeny. In the post-genomic era, this practice has been replaced by utilizing hundreds of genomic regions. While this new practice promises to yield more accurate estimates of the species phylogeny, it also gives rise to a new major challenge, namely accounting for the different evolutionary processes that could be acting simultaneously on the different genomic regions. In particular, three evolutionary processes have been prominent in post-genomic evolutionary analysis: incomplete lineage sorting (ILS), horizontal transfer (or, gene flow), and gene duplication/loss (GDL). Currently, no statistical methods exist for the task of inferring evolutionary relationships of genes and genomes while accounting for all these three processes simultaneously. The overarching goal of the project is to develop mathematical models and algorithmic techniques for this task. The proposed project will produce mathematical and algorithmic results, as well as open-source software that would enable species phylogeny inference from genome-wide data while simultaneously accounting for ILS, gene flow, and GDL.
View original record on NSF Award Search →