GGrantIndex
← Search

QEIB: Statistical Issues in Combining Data for Phylogenetic Analysis

$125,000FY2001MPSNSF

University Of New Mexico, Albuquerque NM

Investigators

Abstract

This research involves the development of methods for combining data sets from different sources with the goal of obtaining robust estimates of phylogenetic relationships. As an example, consider the case in which evolutionary information in the form of DNA sequence data is available from several distinct genes sampled throughout the genome. Phylogenetic trees estimated individually from each of these genes will yield estimated gene trees, trees which illustrate the evolutionary history of that particular gene. What is often of most interest is the estimation of the species history, which may be different from the gene history. Thus, the genetic information must be combined appropriately so that species relationships can be estimated. This research will address three main issues associated with this problem. The first is the study of currently available tests for assessing combinability of the individual data sets, and the improvement of these existing tests. The second component of this research involves the development of new procedures for testing for similarity in underlying evolutionary history in the datasets, and for development of methods for testing which evolutionary mechanisms (i.e., hybridization, horizontal gene transfer, etc.) might be responsible for differing underlying histories. These newly developed tests are based on likelihood ratio statistics which compare the likelihood of a tree-like structure estimated under the assumption of a particular evolutionary force to an unrestricted likelihood. The final component of this research is the development of appropriate methods for combining data from different sources in order to estimate the species tree. This is achieved by modeling the probability that a set of observed gene trees would have arisen from a given species trees. The estimated species tree is then that tree which maximizes this probability. The inference of the evolutionary history of a collection of organisms based on the information contained in their DNA sequences is a problem of fundamental importance in evolutionary biology. The abundance of DNA sequence data arising from genome sequencing projects has led to significant challenges in the inference of these phylogenetic relationships. Among these challenges is the inference of the evolutionary history of a collection of species based on DNA sequence information from several distinct genes sampled throughout the genome. This research will address numerous aspects of this problem, including (1) the assessment of existing procedures for combining data from different genes, and the improvement of such procedures; (2) the development of methods for testing for the cause of differences in the evolutionary histories of distinct genes; and (3) the development of new procedures for combining DNA information from distinct genes with the goal of inferring species relationships. This work has applications in the understanding of much-debated species relationships, such as the evolutionary relationships between placental mammals, marsupials, and monotremes.

View original record on NSF Award Search →