Development Of Theoretical Methods For Studying Biological Macromolecules
National Heart, Lung, And Blood Institute
Investigators
Linked publications, trials & patents
Abstract
Variational embedding of protein folding simulations using Gaussian mixture variational autoencoders Conformational sampling of biomolecules using molecular dynamics simulations often produces large amount of high dimensional data that makes it difficult to interpret using conventional analysis techniques. Dimensionality reduction methods are thus required to extract useful and relevant information. Here we devise a machine learning method, Gaussian mixture variational autoencoder (GMVAE) that can simultaneously perform dimensionality reduction and clustering of biomolecular conformations in an unsupervised way. We show that GMVAE can learn a reduced representation of the free energy landscape of protein folding with highly separated clusters that correspond to the metastable states during folding. Since GMVAE uses a mixture of Gaussians as the prior, it can directly acknowledge the multi-basin nature of protein folding free-energy landscape. To make the model end-to-end differentiable, we use a Gumbel-softmax distribution. We test the model on three long-timescale protein folding trajectories and show that GMVAE embedding resembles the folding funnel with folded states down the funnel and unfolded states outer in the funnel path. Additionally, we show that the latent space of GMVAE can be used for kinetic analysis and Markov state models built on this embedding produce folding and unfolding timescales that are in close agreement with other rigorous dynamical embeddings such as time independent component analysis (TICA). pKa prediction by machine learning Machine learning techniques are developing rapidly in recent years and have been applied to numerous scientific fields. Here, we trained machine learning models on experimental pKa measurements to predict pKa values of ionizable groups in a protein based on the protein's structure. All four tree-based models (Random Forest, Light Gradient Boosting Machine, eXtreme Gradient Boosting and Extra Trees) we trained outperform the widely used pKa prediction tool PROPKA in terms of the three RMSEs (overall, surface groups', buried groups') on the test set. We also made pKa predictions for the Alpha Fold structures and observed large pKa increase for about 1% of the carboxylic groups. A compression strategy for particle mesh Ewald theory We published our compression strategy to reducethe communication burden of particle mesh Ewald calculations, which are the bottlenecks in standard molecular dynamics simulations. The method was also coded up and made freely available in the massively parallel helPME code. To further our efforts to develop tools for computing bilayer bending moduli, we have developed a framework for generating synthetic spectra for a given system, which can then be subjected to different analysis methods to determine the best approach. Using this technique, we are developing a periodic interpolation method that can correctly balance the errors resulting from spectral dampening and aliasing. Analytical Hessians for Ewald and particle mesh Ewald electrostatics After developing the most efficient algorithms to evaluate high order multipole terms found in the most advanced classical force fields, we are currently extending these abilities through the implementation of the second derivatives of these high order terms in order to obtain Hessian terms, thus opening the door to powerful normal mode analysis in high accuracy force field. Extension to the second order is also being developed for induced poarization models, in order to fully exploit the capacity of polarizable (AMOEBA-like) force fields. Implementations also include periodic boundary conditions equations to fully exploit the capabilities of modern simulation softwares. Host-Guest Binding Affinities in the SAMPL Challenges The accurate prediction of binding affinities is vital to almost all aspects of rational drug design. As such, the SAMPL challenges serve as a litmus for testing the status of current methods for computing binding free energies. The LCB has prided itself on maintaining a strong presence in the free energy community through participation in the SAMPL challenges. In particular, the SAMPL challenges have acted as a blind benchmark to evaluate our most current methods. One of the latest challenges, the SAMPL8 host-guest binding drugs-of-abuse'' challenge, involved binding seven different narcotic compounds to curcurbit-8-ural. We submitted five methods based on mixed quantum mechanical/molecular mechanical (QM/MM) methods with strong success, as well as 9 purely classical submissions with parameters generated from QM intramolecular force-matching. Our best submissions employed semi-empirical QM/MM method PM6-D3H4, which gave an RMSE from experimental of 2.43 kcal/mol, a Pearson correlation coefficient of 0.78, and a Kendall-Tau rank coefficient of 0.52. The result is particularly remarkable, as QM/MM best methods generally perform worse than classical approaches. We also partook in the SAMPL8 GDCC host-guest binding, in which we restricted our submission to purely classical methods. Hybrid differential relaxation algorithm in non-equilibrium switches Accurately computing free energy differences is fundamental to almost all of chemistry. In particular, highly sophisticated QM/MM Hamiltonians are desirable in most, if not all, free energy calculations for biochemical applications. The challenge herein is the computational cost associated with performing QM/MM sampling. This can be circumvented by employing the so-called indirect approach to QM/MM free energy, which is characterized by performing the brunt of the free energy simulations with classical Hamiltonians, followed by computing free energy differences between classical and quantum levels of theory in order to correct the classical result to a QM/MM free energy. The overarching challenge in the indirect approach is the computation of the free energy difference between MM and QM/MM. It has been shown that by computing the MM to QM/MM free energy calculation with non-equilibrium approaches, converging the free energy differences is drastically improved. However, in situations where the solute-environment interactions differ significantly between MM and QM/MM, the required length of switching simulation drastically grows. To ameliorate this, we are presently investigating how the so-called hybrid differential relaxation algorithm (HyDRA), which involves the periodic freezing of the QM solute in the switch simulation while allowing the MM environment to relax around it, performs in facilitating non-equilibrium switching simulations. The goal herein is to extend this methodology to incorporating the multiple environment-single subsystem approach to reduce the computational time needed for indirect QM/MM free energy differences. Action-CSA approach for finding multiple reaction paths We have developed a new path finding approach, the Action-CSA method, to find the most dominant reaction pathways between two states using the conformational space annealing (CSA) algorithm. Also, we made Action-CSA2 by rewriting the CSA implementation in CHARMM to make it use more hardware resources. We assessed the sampling ability of Action-CSA2 using alanine dipeptide and hexane and found that the resulting paths are in reasonable agreement with those obtained from long Langevin dynamics simulations. We also applied the Action-CSA2 on a small model system, 16 Alanine, for studying the protein folding problem. These results show that our Action-CSA method has a lot of promise, but reducing the computational cost to extend it to large systems is a large problem. Determination of van der Waals Parameters Using a Double Exponential Potential for Nonbonded Divalent Metal Cations in TIP3P Solvent
View original record on NIH RePORTER →