Integrating Atomic-Scale Antigen Modeling and Repertoire-Scale Protein Language Models
Duke University, Durham NC
Investigators
Abstract
Being prepared for the next pandemic demands a system for anticipating pathogenic antigen mutations in the context of individual immune heterogeneity. Molecular Dynamics (MD) simulations of antibody-antigen interactions are the gold standard in accuracy. However, they are computationally intensive and do not scale to the repertoire level. On the other hand, current repertoire analyses lack the structural context necessary to accurately model interactions specific to a particular antigen. A multiscale approach is hence necessary to achieve scalability, accuracy, and interpretability in modeling antibody-antigen interactions and antibody evolution. Protein Language Models (PLMs) have shown impressive abilities in capturing protein structure and modes of interaction. These models represent proteins as sentences of amino acids and leverage evolutionary conservation to learn distributional rules, generating rich high-dimensional representations of protein sequences. PLMs share similarities with large language models used in natural language processing, learning distributional semantics that correspond to evolutionary sequence patterns encoded across hundreds of millions of proteins. PLM-based models have demonstrated the capacity to predict protein structure and function from sequence alone. For instance, D-SCRIPT has been shown to predict protein-protein and protein-small molecule interactions, while AbMAP, a PLM fine-tuned to capture antibody hypervariability, has enabled rapid, large-scale analysis of antibody repertoires. However, current PLM-based approaches learn overall evolutionary fitness, making them less adept at antigen-specific tasks. In contrast, physics-based approaches are more robust than data-driven approaches because they consider the statistical distribution of macromolecular states that define differences in affinity. This makes them particularly valuable early in a pandemic when limited data is available. Combining MD simulations with Markov State Modeling (MSM) is effective for modeling the structural dynamics of mutations that lower antibody-antigen association rates, providing a more comprehensive understanding of all states involved in association and dissociation. Therefore, we propose to develop multiscale models that combine detailed physical models with PLMs to model antibody-antigen interactions. These models use MD simulations to detail the kinetics and thermodynamics of antibody-antigen interactions, providing mechanistic insights into viral immune evasion. By integrating these insights into PLMs, large-scale statistical models can be informed by detailed molecular findings, enhancing predictive accuracy for immune responses and viral evolution. The framework is designed to quickly integrate real-time data, allowing for fast adaptation to new genomic, epidemiological, and clinical information. This ensures that the models remain up-to-date, providing timely insights for tracking viral evolution and immune responses, thereby enhancing pandemic preparedness. The integration of detailed physical models with broad-scale PLMs offers a powerful approach to understanding immune responses and viral evolution, supporting personalized medicine and public health surveillance.
View original record on NIH RePORTER →