GGrantIndex
← Search

CDI-Type I: Using Machine Learning to Develop New Approaches to Semiempirical Quantum Chemistry

$670,947FY2010MPSNSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

The proposed work melds quantum chemistry with machine learning to develop efficient computational methods for predicting the electronic structure of chemical systems. The past decades have brought quantum chemistry to a point where highly accurate results can be routinely generated for small molecules. However, the computational cost increases rapidly with molecular size, making calculations on proteins or complex nanostructures challenging. This project takes advantage of molecular similarity, whereby molecular fragments behave similarly in different environments, to substantially lower the computational cost. First, a database of accurate but computationally expensive high-level results on the electronic structure of a molecular fragment in a range of environments is generated. This data is then used to develop a machine learning algorithm that uses information about the molecular fragment and its environment to predict the behavior of the fragment. The challenge for machine learning is to generalize to new fragments and environments, to integrate this generalization into the larger molecular simulation, and finally to characterize the performance to allow reporting of the confidence in the eventual simulation results. For example, if the learning algorithm works by breaking chemical space into regions that can be well described with low-cost approximating functions, the approach must characterize the boundaries of these regions and handle the transitions between the regions. This challenge will be addressed by a close integration of the chemistry and machine learning portions of the project, such that design decisions regarding the form of the approximating function and learning algorithm are made together. The ability to quickly and accurately generate the energy of a molecular system would have broad impact in domains such as biology and nanotechnology. Current computational approaches to large molecular systems rely on greatly simplified models of the energy, such as the ball and stick models of molecular mechanics. While such models are useful for structure, functional predictions often require breaking and formation of chemical bonds, which requires more realistic electronic structure approaches. The approaches developed here are designed to make realistic functional predictions for large systems computationally feasible. The close integration of chemistry and machine learning also provides excellent interdisciplinary training opportunities for both graduate and undergraduate students. This is a Cyber-Enabled Discovery and Innovation Program award and is co-funded by the Division of Chemistry and the Office of Multidisciplinary Activities.

View original record on NSF Award Search →