GGrantIndex
← Search

Development Of Advanced Computer Hardware And Software

$1,113,655ZIHFY2025HLNIH

National Heart, Lung, And Blood Institute

Investigators

Linked publications & trials

Abstract

Several projects have been pursued in the reporting period: Development of new simulation methods for CHARMM and AMBER Molecular simulation and modeling software packages are the primary vehicles for computational research and experiment. Implementation of new methods and options is the key to facilitate cutting edge researches. In recent years, our lab has developed a series new computational methods, such as the self-guided Langevin dynamics for efficient conformational searching and sampling, the isotropic periodic sum method for accurate and efficient calculation of long-range interactions, and the map-based modeling tool, EMAP, for electron microscropy studies. Implementation of these new methods enables researchers to tackle difficult problems. We implemented these methods into CHARMM to expand its capability in molecular simulation, conformational search, and structure prediction. These methods are all available in CHARMM. In addition, double exponential potential with IPS is available in CHARMM version 49. These methods are also been implemented into another widely used simulation package, AMBER, to extend the user scope to access these methods. The SGLD, IPS, and EMAP methods are available in AMBER version 24. In the past fiscal year, we have implemented free energy calculation for double exponential potential systems into CHARMM and implemented SGLD GPU version for AMBER PMEMD. Development of Radial Threshold Clustering based on n-ary similarity Radial Threshold Clustering (RTC) is a non-hierarchical clustering method based on clustering frames in a trajectory based on RMSD thresholds to seed frames. RTC as originally implemented is an O(N2) algorithm, limiting its usefulness for larger datasets. The results of RTC are also non-deterministic, depending on the order of input frames. We have developed an extension of RTC called Extended Quality Clustering (eQual), which is an O(N) algorithm with several novel features. We have increased the speed of the seed selection by using k-means++ to select the seeds of the available frames. To address the second issue and make the results invariant with respect to frame order, the densest and most compact cluster is chosen using the extended similarity indices. The new algorithm is able to cluster in linear time and produce more compact and separate clusters. Development of a density-based clustering algorithm using extended similarity metrics As the amount of data that is generated by molecular dynamics (MD) simulations continues to grow, it is important that the techniques used to analyze the data from these simulations is able to keep up. Clustering of MD trajectory data is an important method for determining both the major conformational states of the biomolecules being simulated and any potential transition states. Traditionally, clustering algorithms rely on pairwise comparisons between all structures in an ensemble, and consequently scale as O(N2). Extended similarity techniques, in which a structure can be assigned a score that differentiates it from the ensemble in an efficient manner, are an attractive alternative to pairwise comparisons as they scale as O(N). In our previous work we applied extended similarity metrics to the K-Means style of clustering. In this work, we introduce a novel density-based clustering algorithm using extended similarity metrics called Clustering Algorithm Density-based Exploration of Nearest Common Environments (CADENCE). We show that CADENCE can effectively identify metastable states of various sizes and densities via its radial threshold search for high-density areas and its nearest neighbor search for lower-density parts of the clusters. Speeding up computationally intensive analysis in CPPTRAJ via GPU acceleration The use of graphical processing units (GPUs) has allowed the amount of data generated by molecular dynamics (MD) simulations to increase exponentially over the past decade and a half. While there has been significant development of MD software on GPUs for the generation of data, there has not been as much of a push to develop analysis software for GPUs. We have focused our efforts on leveraging GPUs to speed up time-consuming analyses in the MD analysis program CPPTRAJ. We have been able to increase the speed of several analyses by multiple orders of magnitude, particularly those analyses requiring the computation of millions of distances per frame such as determining the closest solvent molecules to a given solute, the number of waters in water shells (first and second solvation shell), the non-bonded calculation of the energy in the Grid Inhomogeneous Solvation Theory (GIST) method, and the calculation of the radial distribution function around atoms. We are continuing to collaborate with NVIDIA to rework and upgrade existing GPU-accelerated analyses to better leverage modern GPU hardware (e.g. modern GPUs have seen a vast increase in the number of tensor cores), as well as adapting other time-consuming analyses (such as volumetric map density, determination of hydrogen bonds, rotational diffusion, clustering, and so on) for GPUs. A Hybrid Machine Learning and Molecular Mechanics Potential This work develops a hybrid machine learning/molecular mechanics (ML/MM) interface integrated into the AMBER molecular simulation package. The resulting platform is highly versatile, accommodating several advanced machine learning interatomic potential (MLIP) models while providing stable simulation capabilities and supporting high-performance computations. Building upon this robust foundation, we developed new computational protocols to enable pathway-based and end point-based free energy calculation methods utilizing ML/MM hybrid potential. In particular, we proposed an ML/MM-compatible thermodynamic integration (TI) framework that adequately addressed the challenge of applying MLIPs in TI calculations due to its indivisible nature of energy and force. Our results demonstrated that the hydration free energies calculated using this framework achieved an accuracy of 1.0 kcal/mol, outperforming the traditional approaches. Moreover, ML/MM enables more precise sampling of conformational ensembles for improved end point-based free energy calculations. Overall, our efficient, stable, and highly compatible interface not only broadens the application of MLIPs in multiscale simulations but also enhances the accuracy of free energy calculations from multiple aspects. New methods Implementation in apoCHARMM apoCHARMM is a high-performance molecular dynamics package optimized for GPUs. It's built with a C++/CUDA backend for efficiency and a Python interface for ease of use. Recently, we've added several new features to the package. We've implemented new support for creating Protein Structure Files(PSF), in addition to the existing ability to read them. This allows users to easily configure simulations directly within apoCHARMM, eliminating the need for slower, third-party tools like CHARMM-GUI for system setup.We're also actively developing methods for calculating relative free energy differences. This includes both dual-topology and single-topology schemes, which are being implemented by adjusting the PSF files. A highly efficient scheme for performing constant pH simulations using Enveloping Distribution Sampling (EDS) is also under development. Unlike the previous design, which required recalculating energy terms and made scaling to multiple sites difficult, the new implementation efficiently handles exclusion terms in reciprocal space calculations. By calculating common energy terms only once, this new approach is highly scalable for simulating protonation at multiple sites. Speq : sparse equivariant neural networks for efficient and accurate additive QM/MM simulations using machine learning Quantum Mechanics/Molecular Mechanics (QM/MM) hybrid methods provide a powerful framework for simulating biomolecular systems by describing the chemically active region with Quantum Mechanics (QM) for accuracy, while treating the surrounding environment with Molecular Mechanics (MM) for computational efficiency. While QM delivers a detailed picture of electronic and chemical phenomena, its high cost makes simulations orders of magnitude slower than MM. Recent progress in machine learning (ML) has enabled neural network (NN) potentials to replace conventional QM calculations in QM/MM, offering substantial speedups. However, most NN-based QM/MM approaches rely on mechanical embedding schemes, which often sacrifice accuracy. In this work, we present Speq, a sparse equivariant neural network designed to advance additive QM/MM simulations. Our architecture introduces sparsity by constructing graphs with edges connecting QM atoms to one another as well as to their MM environment, ensuring physically meaningful interactions. Equivariance is enforced through spherical harmonics-based features, enabling data-efficient learning while preserving rotational and symmetry constraints. Benchmarking on benzene and uracil demonstrates that Speq reduces errors by an order of magnitude compared to existing NN-based QM/MM models, establishing a new standard for accuracy and efficiency in ML-accelerated multiscale simulations. Trajax : a jax based tool for efficient trajectory analysis In this work, we've implemented algorithms for molecular dynamics trajectory analysis within JAX, a numerical linear algebra library. By leveraging JAX's core features—including automatic differentiation, just-in-time (JIT) compilation, and GPU/TPU acceleration—our approach enables high-performance computation for complex analysis tasks. While JAX's functional programming and immutable arrays require a different coding paradigm than traditional NumPy, the trajax package simplifies this by providing a user-friendly interface that handles these underlying design patterns. For seamless integration, we've incorporated the familiar MDAnalysis I/O interface, focusing our efforts on computational efficiency. The package currently includes implementations for key calculations such as RMSD, PCA, time correlation analysis, and the radius of gyration, with more features under active development.

View original record on NIH RePORTER →