GGrantIndex
← Search

Elements: FLARE infrastructure for reproducible active learning of Bayesian force fields for ex-machina exascale molecular dynamics

$379,050FY2020CSENSF

Harvard University, Cambridge MA

Investigators

Abstract

Much needed progress in technologies for energy storage and conversion relies on our ability to design and understand next-generation key functional materials at the core of these systems. The fundamental physical effects that govern the functions of batteries, catalysts and fuel cells originate at the atomic level. Molecular dynamics simulations are indispensable tools with broad applicability for materials research due to their ability to probe microscopic details of atomic motion and predict thermodynamics, reaction kinetics and ionic diffusivities of many materials. Machine learning approaches are transforming how simulations of complex materials are performed; hence software tools are needed to make this transition faster and smoother. Our main goal is to advance machine learning methods and create software for constructing accurate fast simulation models that contain principled uncertainty of their predictions, which is a highly desirable target in many data science areas beyond atomistic modeling. Principled uncertainty quantification is especially critical for prediction of non-equilibrium dynamics, where rare important events, such as the breaking of bonds or atomic migration, determine the material’s performance but involve atomic configurations that are unlikely to be in a naive unbiased training set. Tools that we aim to develop will accelerate and automate computational research efforts in the fields of catalysis, batteries, thermal coatings, soft structural and functional materials, and actuators, to name a few. Currently available tools for machine learning based force fields are based on non-parametric methods that only provide estimates without uncertainties, require large amounts of training data, and are slow to evaluate for large numbers of atoms of different species. Our goal is to develop community software infrastructure to enable a new paradigm of simulating non-equilibrium dynamics of complex materials, where ML models are automatically trained and dramatically accelerate ab-initio simulations on-the-fly, preserving exact physical symmetries with minimal accuracy loss. We will create and freely disseminate FLARE (Fast Learning of Atomistic Rare Events), a parallelized database-driven automation framework tightly coupling ML model training with high-fidelity DFT computations, using rigorous model uncertainty to guide data acquisition via closed-loop active learning. Specially designed many-body multi-species kernels and tools for systematic hyperparameter optimization will allow the models to be mapped to fast tabulated Bayesian force fields, implemented in the widely used MD software aimed at exascale computing performance. The result will be the ability to perform MD simulations of materials systems of millions of atoms at near-DFT accuracy and with predictive uncertainty. The unique advantages of the proposed infrastructure are (1) the automated training requiring minimal amounts of DFT data, (2) predictions containing principled Bayesian uncertainty, (3) scalable performance of at least 5 orders of magnitude faster than ab-initio molecular dynamics, and (4) ability to record full provenance and reproducibility information of training and prediction workflows. This award by the NSF Office of Advanced Cyberinfrastructure is jointly supported by the Division of Materials Research and the Division of Chemistry within the NSF Directorate of Mathematical and Physical Sciences and the Division of Chemical, Bioengineering, Environmental and Transport Systems within the NSF Directorate of Engineering. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →