Advancing evolutionary genetics through deep learning
Univ Of North Carolina Chapel Hill, Chapel Hill NC
Investigators
Linked publications & trials
Abstract
Project Summary/Abstract Background: A major challenge in evolutionary genomics is to characterize the forces shaping patterns of genetic variation. For instance, disentangling the effects of natural selection on genetic diversity from those of demographic changes is notoriously challenging. This question has important implications for our understanding of key evolutionary processes: how do species successfully adapt to new selective pressures, and can we determine which genes were responsible for these adaptations? Researchers typically address these and similar problems using statistical summaries of genome sequence variation that provide insights into the evolutionary forces at play. However, because such approaches typically rely on a univariate summary of the data, valuable information present in the original dataset is lost. A more fruitful strategy would be to use multidimensional representations of genomic data or even the totality of the input (e.g. a matrix representation of a sequence alignment). Modern deep learning methods represent an enticing route toward using high- dimensional representations of sequence data to make accurate evolutionary inferences. As the size of genomic datasets continue to expand at an ever-growing rate, deep learning methods will need to capable of leveraging compressed representations of genomic data such as tree sequences: the sequence of evolutionary trees that changes as one moves along a recombining chromosome. Proposal: The Schrider Lab develops and applies powerful machine learning methods for evolutionary inference. We have developed numerous deep learning toolsâincluding several produced under the highly productive R35 that this application seeks to renewâthat leverage high-dimensional representations of genomic data to make more accurate evolutionary inference than was previously possible. However, we are only scratching the surface of what deep learning can accomplish in our field. We will therefore develop modern deep learning tools like graphical neural networks and large language models (LLMs, which have revolutionized natural language processing) to solve pressing problems in evolutionary genetics. This work will include the development of efficient tools to make accurate inferences from tree sequence representations of large genomic data sets. Moreover, we will incorporate recent and ongoing advances in the interpretability and robustness of deep learning methods into all of our proposed tools. We will use these methods to answer key evolutionary questions such as the impact of cross-species introgression in Drosophila, and the manner in which important mosquito vectors adapt to anthropogenic selective pressures such as insecticides. More broadly, the success of the novel approaches described in this proposal has the potential to transform the methodological landscape of evolutionary genomic data analysis.
View original record on NIH RePORTER →