GGrantIndex
← Search

Characterizing the Effects of Sequence Variability in Molecular Function, Evolution and Design

$541,096R35FY2025GMNIH

University Of Texas Dallas, Richardson TX

Investigators

Linked publications & trials

Abstract

Proposal Summary Biological sequences that perform a functional role evolve via changes that tend to preserve their original function or by the acquisition of new properties that might confer a fitness advantage. The processes that lead to novel functional sequences or those that explain a negative effect given a biomolecular change are not yet fully understood. Solving this puzzle is challenging given the complex network of interactions and constraints that biomolecules are subject during evolution. This MIRA project aims to continue our efforts to understand, characterize and utilize landscapes of sequence variability and the effect of sequence change on function and fitness of biomolecules. We have developed tools based on epistatic probabilistic models, latent generative models, and evolutionary dynamics to have a predictive understanding on how sequence composition as well as novel sequence space leads to biomolecules with certain functional properties and fitness. Over the past years we have made contributions and providing evidence that it is possible to construct coevolutionary models using extant sequence data that can infer if novel sequences will have certain functional properties or if the changes will be detrimental. In this project, we propose to continue developing techniques based on state-of-the-art computational techniques in machine learning that are aided by our interpretable epistatic models to 1) increase the accuracy of our sequence fitness landscape determination and navigate landscapes effectively, 2) be able to design and experimentally test novel biomolecules with desired functional properties, 3) understand the dynamics of functional preservation in pathogenic molecules that might aid the forecasting of variants that can contribute to disease or antibiotic resistance. This project will allow us to expand our developments of Latent Generative Landscapes (LGL) and models of sequence evolution with epistatic contributions (SEEC) to work with new and more complex systems as the ones we have worked in the past, including metal transporters, temperature sensing proteins, proteases, viral proteins, and antibiotic resistance enzymes. Our key hypothesis is that an accurate modeling of the latent manifold of the protein sequence space will lead us to a better inference of functional classification of many uncharacterized proteins, help us understand and fine-tune the most important determinants for functional selectivity and predict evolutionary trajectories of biomolecules that could then be used for therapeutic purposes like vaccine development and drug design.

View original record on NIH RePORTER →