DMS/NIGMS 2: Scalable Bayesian Inference with Applications to Phylogenetics
William Marsh Rice University, Houston TX
Investigators
Abstract
This project concerns methods for Bayesian inference, a variation on the scientific method that quantifies the degree of certainty in a particular hypothesis. The work is motivated by application to phylogenic analysis methods, which help to infer evolutionary history and have facilitated great progress towards placing extant and fossil species on the tree of life. However, existing methods are unable to infer a complete tree of life due to performance limitations. Additionally, the metaphor of a tree breaks down when exchange of genes occurs between contemporaneous species, necessitating additional links to form a network of life. This project aims to develop improved methods that not only scale to the challenge of inferring a complete tree of life but do so in a principled way that ensures the ability to quantify degree of confidence in estimated trees and networks. These improvements are expected to be applicable to other areas of research as well, far beyond phylogenetics and evolutionary biology. This project will also provide training and research opportunities for graduate students and research experiences for teachers. The Markov-Chain Monte Carlo (MCMC) algorithm is broadly applicable for Bayesian inference and often used to implement phylogenetic analysis methods. The overarching goal of this project is to develop techniques for significant scalability of Bayesian MCMC inference with mathematical guarantees. While the work will be implemented for and illustrated in phylogenomics, it is applicable to all domains where MCMC is used. To achieve this, the research aims to develop novel methods and mathematical results in four areas: (1) likelihood functions and calculations for parallel computation to take advantage of modern multi- and many-core computing hardware, (2) sampling over complex graphs to avoid walking in the space of phylogenetic trees and networks with its mix of discrete and continuous parameters and associated complexity of reversible jump moves and Hastings ratio calculations, (3) structured prior distributions to improve mixing, and (4) a divide-and-conquer approach to large scale inference building on existing techniques and those developed in this project. In addition to establishing mathematical results, all methods will be implemented and tested thoroughly on simulated and observed biological data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →