III-SGER: Algorithms for Next-Generation Protein Modeling: Beyond Pair-wise Interactions
Georgia Tech Research Corporation, Atlanta GA
Investigators
Abstract
This work pursues the development of a new algorithmic framework which allows for the first time efficient computation of higher-order interactions in biomolecules. Algorithms are created to demonstrate two important applications on much larger scales than were previously tractable, each representing a new door to a larger class of further possibilities: Axilrod-Teller (3-body) simulation, and Hartree-Fock (4-index) quantum-level simulation. The multidisciplinary project brings together experts in computer science, protein folding, and quantum chemistry. Biomolecular simulations usually break down complex chemical systems into a balls-and-springs mechanical model augmented by torsional terms, pair-wise point-charge electrostatic terms, and simple pair-wise dispersion (van der Waals) interactions. However such models often fail to capture important, complex non-additive interactions found in real systems. Though the criticality of multi-body potentials for more accurate and realistic molecular modeling has been argued by various authors, their evaluation in systems beyond tiny sizes has not been previously possible due to the unavailability of an efficient way to realize the computation, which is cubic or higher. The work augments a framework for computational problems called Generalized N-Body Problems, which contains any such higher-order physical potential. The framework was originally developed to accelerate common bottleneck statistical computations based on distances, utilizing multiple kd-trees and other space-partitioning data structures to bring down computation times both asymptotically and practically by multiple orders of magnitude. This work extends the framework with higher-order hierarchical series approximation techniques, demonstrating how to do a fast multipole-type method for higher-order interactions for the first time, effectively creating a Generalized Fast Multipole Method. The algorithms are validated in biochemical systems chosen to clearly illustrate many-body interactions: hydrogen bonds and three-body dispersion interactions. Parameters for potential functions are obtained using customized machine learning methods on dual data sets generated by the co-PI's labs: high-quality quantum mechanical benchmark data and experimental protein structures. The goal is to demonstrate working many-body codes able to explore the effect of modeling higher-order interactions on a larger scale and more systematically than ever attempted previously. The intellectual merit of the work is the elucidation of the first multi-tree multipole method capable of accurately and scalably performing these fundamental types of higher-order physics computations. The potential broader impact is the ability to perform more accurate next-generation molecular modeling, with implications for fundamental biology and drug design. For further information see the project web page at http://www.cc.gatech.edu/~agray/gfmm.html
View original record on NSF Award Search →