GGrantIndex
← Search

A Statistical Learning Framework for Phylogenetic Inference: Information, Uncertainty, and Geometry

$219,573FY2020MPSNSF

University Of Delaware, Newark DE

Investigators

Abstract

Phylogenetics, the study of the evolutionary relationships among individuals or groups of organisms from molecular sequence data, is a dominant theme in biological research. In the last few decades, the explosion in the amount of available data for phylogenetic inference has offered great opportunities to further our understanding of various biological processes. At the same time, they also move phylogenetics to a new learning regime, where traditional theories can no longer guide developments and interpretations of phylogenetic algorithms. The main goals of this research are to improve our understanding of the central concepts of phylogenetics through the viewpoints of statistical learning and information theory and to provide essential tools for phylogenetic analyses in this new learning setting. By providing a framework to design, analyze and improve phylogenetic estimators, the research will greatly extend the set of problems for which reliable analyses can be obtained. Most notably, our research in the setting when the number of species increases is especially amenable to the biology of small evolving units, including studies of viruses and antibody-making B-cells. The education component of this study involves mentoring undergraduate and graduate students with independent research in phylogenetics and will produce various demonstrations, tutorials and statistical packages for phylogenetic inference. The proposed research lays out the foundation for explicit quantification of phylogenetic information and uncertainty, with a focus on the setting where sequence data are continually being generated and analyzed. This approach enables the use of local phylogenetic methods as a means to analyze likelihood-based methods and helps investigate systematic ways to stabilize uncertainty while retaining essential information. Two analytical tools to construct and analyze phylogenetic methods in non-asymptotic settings will be developed: a new class of concentration inequalities for evolutionary-related random variables and a Taylor-like local-to-global expansion of phylogenetic likelihood on the space of phylogenetic trees. These newly derived tools will be used to study several important inference problems, including species tree/supertree reconstruction and parameter estimation in viral phylogenetics and trait evolution. The research activities will provide important insights into the influences of phylogenetic information and uncertainty on the stability of a phylogenetic estimate. This award is co-funded with the Statistics program and the Life Science Venture Fund in DMS. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →