SG: Exploring the Impact of Model (Mis) Specification on Empirical Divergence-Time Estimates
University Of California-Davis, Davis CA
Investigators
Abstract
Phylogenetic trees - estimates of the evolutionary relationships among species - have become central to virtually all areas of research in systematics, evolutionary biology, ecology, molecular biology, and epidemiology because of the essential and explicit historical perspectives they provide. Phylogenies have extended the branches of their influence from the scientific to the public realm, informing decisions regarding the surveillance and monitoring of pathogens, vaccine design, and conservation priorities. Although many phylogenies are based on molecular sequence data (DNA) collected from extant species (or strains), these trees can also provide information regarding the absolute or relative branching times within a lineage. This temporal information is critical to many questions, such as inferring when a virulent strain of a virus first arose and estimating how quickly it is changing. These considerations have motivated the development of a large number of mathematical models for inferring the time scale of evolutionary trees. This project seeks to assess the reliability of these mathematical models using empirical datasets, and will inform researchers and public alike on the best practices for using these methods. Undergraduates recruited through the University of California Historically Black Colleges and Universities initiative and the US Davis Initiative for Maximizing Student Diversity will be trained in statistical phylogenetic methods and bioinformatics. All software developed will be distributed freely under open-source licenses. The researcher will also develop a new, stand-alone workshop and associated teaching materials on Bayesian divergence-time estimation methods to be used broadly, including by researchers who do not work directly in phylogenetic research. The primary objective of this research is to explore the statistical behavior of Bayesian methods for estimating species divergence times in an empirical setting. This main goal will be achieved by applying all currently implemented relaxed-clock models and calibration methods to a large sample of empirical datasets to: (1) reveal the extent to which divergence-time estimates are sensitive to the specified relaxed-clock model/calibration method; (2) explore the relative influence of the three primary model components - branch-rate priors, node-age priors, and calibration approaches - on divergence-time estimates; (3) assess the relative fit of the pool of candidate relaxed-clock models to real data using robust Bayesian model-comparison methods; and (4) develop analytical protocols and implement pipelines that automate the efficient exploration of relaxed-clock model space for empirical analyses. Facilitating more careful model selection will improve our ability to estimate divergence times, which, in turn, will broadly benefit scientific and broader communities.
View original record on NSF Award Search →