Collaborative Research: ABI Innovation: Quantifying biogeographic history: a novel model-based approach to integrating data from genes, fossils, specimens, and environments
Michigan State University, East Lansing MI
Investigators
Abstract
Forest ecosystems cover a third of the land area of the United States and are a significant economic and cultural resource. Managing forests for the future depends on knowledge of their historical dynamics. For example, after the last Ice Age many species, including trees, generally moved north as environmental conditions became more favorable, leading to large changes in population size and geographic range. Fundamental biological questions about these shifts include: (i) how do plants establish in new locations across great distances in spite of having limited seed dispersal abilities, (ii) to what degree do species travel synchronously as communities or individually, (iii) which species moved the fastest and why, and (iv) where did species reside during the last Ice Age? Traditionally scientists have used one of three types of data to address these questions: specimens from museums and herbaria matched with contemporary environmental data, DNA sequences that hold imprints of recent and past changes, and species' presence in the fossil record including ancient pollen deposited in lake sediments. However, studies based on single data types have not been able to fully resolve the aforementioned questions, largely due to lack of integrative computational methods and infrastructure. This research will develop methods and software that, for the first time, coherently combine the three main data types and existing theory to provide a more comprehensive understanding of species' biogeographic history. Each type of data has different strengths and weaknesses; utilizing the strengths of each will make best use of the total information on species' range shifts. The methods developed will provide the infrastructure needed to leverage "big data" and enable scientific progress on significant, long-standing questions about species historical dynamics, which will serve a variety of scientific communities. This work also serves the national interest, advancing prosperity and welfare by enabling future studies of natural environments which are an important cultural and economic resource. Knowledge gained by using these new methods can help inform management of natural resources (i.e. forests and grasslands) and functioning ecosystems, and identify geographic regions that are resilient to environmental stress or may contain unique genetic resources to help species adapt. These new computational methods will be produced in an open-source, online, documented, and transparent code development system with which anyone can interact, as well as through two interactive workshops that will emphasize participant diversity. This project will also advance science education through broader impacts at multiple educational levels by i) partnering with an established K-12 educational program to teach ecological concepts, ii) designing a course-based undergraduate research experience, iii) producing educational videos and exhibits at two botanical gardens that collectively reach two million visitors, and iv) providing training and mentoring to early career scientists and students. Despite continued improvements in data reliability and accuracy, questions about Quaternary species range shifts remain hotly debated. This debate is fueled in part by known, substantial limitations and biases of the primary data types used to reconstruct biogeographic history (i.e., fossils and inferred paleo-vegetation, current and hindcast species distribution models, and current and ancient genomic data). Conflicting results from past studies regarding the speed of range shifts and location of refugia inferred from different approaches have slowed progress in paleoecology for decades. This project will develop comprehensive, statistically robust informatic tools to coherently integrate the information content of disparate and heretofore disconnected data types and models for inferring species' genetic, demographic, and biogeographic history. The objective of this research is to build informatic infrastructure that will help scientists leverage information from multiple sources spanning space and time to (a) better estimate key demographic parameters, (b) generate maps of species distributions post-glaciation, and (c) account for uncertainty from each data type. The framework is rooted in Approximate Bayesian Computation but with additional modules that will build on the state-of-the-art in biogeographic inference. The informatic improvements will occur in four stages of increasing novelty and data integration, with specific outputs at each stage. The informatic advances will be evaluated for computational efficiency and effectiveness through analyses of both simulated data and an existing empirical dataset for a foundational tree species, green ash, Fraxinus pennsylvanica. This research will help scientists from many fields make the most benefit from the ongoing renaissance in methods and databases in genomics, environmental modeling, and paleo-data to help achieve better understanding of past species' dynamics (demographic growth rates, long distance dispersal, biotic velocities, etc.) at a spatial and temporal resolution that was previously unachievable. The scientific community will be involved in model and software design via open source, community development and coding on GitHub, and two hands-on workshops. Results from this project can be found online at https://github.com/orgs/TIMBERhub. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →