Latent Space Simulators for the Efficient Estimation of Long-time Molecular Thermodynamics and Kinetics
University Of Chicago, Chicago IL
Investigators
Abstract
Andrew Ferguson of the University of Chicago is supported by an award from the Chemical Theory, Models and Computational Methods program in the Division of Chemistry to establish new theoretical and computational tools to simulate the dynamics of proteins and DNA. Computer simulations provide a means to model and understand the structure, dynamics, and properties of these molecules at a level of detail inaccessible to experiment. The high computational cost of these calculations mean that the accuracy of their predictions is limited since it is difficult to simulate the dynamics of biomolecules for longer than microseconds even on the most powerful supercomputers. In this work, Ferguson will develop novel simulation approaches enabled by machine learning and new mathematical theorems to simulate biomolecules millions of times faster than is currently possible. The crux of the approach rests on the development of ultra-efficient simulators that identify and model only the key variables driving the long-time molecular behavior. The approach is being developed and tested on well-understood fast-folding mini-proteins, and then applied to better understand dysfunction in proteins implicated in cancer, to control the kinetics of DNA double helix formation, and to determine how proteins recognize and bind to DNA. As part of the work, the new computational tools will be made available as free open-source software and Ferguson is offering mentored research experiences for undergraduate and high school students, serving as an instructor in workshops for City Colleges of Chicago students, and developing molecular simulation training materials for the NSF-supported nanoHUB.org. Andrew Ferguson of the University of Chicago will develop the theoretical and algorithmic foundations of an approach to generate ultra-long atomistic molecular simulation trajectories of biomolecules that are continuous in space and time. This approach, termed latent space simulators (LSS), is trained over short, discontinuous, enhanced sampling simulation data, and then produces continuous all-atom simulation trajectories obeying the correct structural, thermodynamic, and kinetic statistics at several orders of magnitude lower cost than conventional molecular dynamics. Accelerations are realized by the vastly lower cost of propagating the dynamics within a low-dimensional slow subspace spanned by the collective variables governing the long-time dynamical evolution of the molecular system. The computational implementation employs three specialized deep learning architectures that (i) identify the slow collective variables, (ii) propagate the dynamics within this slow subspace, and (iii) decode back to molecular space. Ferguson is developing the approach for fast-folding mini-proteins Trp-cage and protein G, and then applying it to recover structural transitions in c-Src kinase that is overexpressed in cancers, to engineer sequence-dependent hybridization kinetics of DNA oligomers, and to understand binding of transcription factor proteins to DNA. The ultra-long molecular trajectories generated by the LSS can resolve kinetic mechanisms at time scales inaccessible to existing approaches and is being made broadly available as free and open-source software. Ferguson will also offer mentored undergraduate and high school research opportunities and reach out to the community college students through hosting workshops at City Colleges of Chicago. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →