EAGER: Estimating Phylogenetic Trees when Character Evolution is neither Independent nor Identically Distributed
University Of Pennsylvania, Philadelphia PA
Investigators
Abstract
The evolutionary tree or phylogeny of a set of species is a tree that explains the history of their evolution from a common ancestor. To estimate this history, scientists make observations about species that are alive today and seek to find a tree that best fits this data. The most common types of observations these days are in the form of biomolecular sequences such as DNA or protein sequences for genes or proteins. These sequences are aligned so that corresponding positions in the given sequences exhibit as much similarity as possible. Each column of the alignment is called a site or a character. In the standard model of evolution each node in the tree has a certain state for the character and transmits this state to its children. However, the state is probabilistically mutated along each edge of the tree. The standard model also assumes that all characters evolve according to identical, independent stochastic processes. Tight bounds are known for the number of characters needed to infer the tree (and mutation probabilities on the edges) under these assumptions. The problem is that these assumptions are not biologically realistic. It is well known that selection pressure operates differently on different sites and that the evolution of one character can be dependent on other characters. Much more sophisticated mathematical analysis is needed to infer the tree and dependence structure under these conditions, and this is precisely the major goal of this project. The problem of reconstructing evolutionary trees is of central importance in biology since evolution is the theory in biology. Computer scientists have made contributions to this field, but the solutions provided by computer scientists so far simplify the problem too much to produce reliable solutions on realistic data. This project aims to take an important step towards making tree inference algorithms more realistic.
View original record on NSF Award Search →