GGrantIndex
← Search

AitF: Full: Collaborative Research: Graph-theoretic algorithms to improve phylogenomic analyses

$360,000FY2015CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

Understanding the history of life on earth ? how species evolved from their common ancestor ? is a major goal of biological research. These evolutionary trees are very hard to construct with high accuracy, because nearly all of the most accurate approaches require the solution to computationally hard optimization problems. Furthermore, research has shown that the evolutionary tree for a single gene can be different from the evolutionary tree for the species, and current methods do not provide adequate accuracy on genome-scale data. As a result, large evolutionary trees, covering big portions of ?The Tree of Life?, are very difficult to compute with high accuracy. This project will develop methods that can enable highly accurate species tree estimation. The key approach is the development of novel divide-and-conquer strategies, whereby a dataset is divided into overlapping subsets, species trees are constructed on the subsets, and then the subset species trees are merged together into a tree on the full dataset. These approaches will be combined with powerful statistical estimation methods, to potentially transform the capability of evolutionary biologists to analyze their data. This project will also provide open source software for the new methods that are developed, and provide training in the use of the software to biologists at national meetings. The project will also contribute to interdisciplinary training for two doctoral students, one at Illinois and one at Berkeley, and course materials for computational biology will be made available online. Understanding evolution, and how it has operated on species and on genes, is a major part of biological data analysis. Statistical estimation approaches often provide the best accuracy, but cannot scale to dataset sizes that are required for modern biology. In addition, species tree estimation is challenged by the heterogeneity of evolutionary trees across the genome, and no current methods are able to provide highly accurate species trees for genome-scale data. These challenges make it essential that new methods be developed in order to make highly accurate large-scale evolutionary tree estimation possible under these complex evolutionary scenarios. This project will develop novel algorithmic strategies to address three key problems: supertree estimation, species tree estimation in the presence of gene tree heterogeneity, and scaling statistical methods to large datasets. In addition to developing graph-theoretic algorithms, the project team will establish mathematical guarantees for these methods using chordal graph theory and probabilistic analysis, under stochastic models of gene and sequence evolution.

View original record on NSF Award Search →