GGrantIndex
← Search

CDS&E: A Machine Learning Architecture for General, Reusable Models for Guest-Host Chemical Bonding

$428,540FY2022MPSNSF

Tulane University, New Orleans LA

Investigators

Abstract

Matthew M. Montemore of Tulane University is jointly funded by an award from the Chemical Theory, Models and Computational Methods program in the Division of Chemistry, and the Established Program to Stimulate Competitive Research (EPSCoR) to develop a new, more efficient machine learning framework for chemistry and materials science. While machine-learning has been very useful for efficient screening of molecules and materials, previous approaches generally require a new machine learning model for each application. For example, a machine learning model that is used to design one type of battery can usually not be used for other types of batteries. Creating a new model from scratch for each battery type requires significant time and effort to generate data and perform fitting. In the work supported here, Dr. Montemore and his research group will develop the capability to generate machine learning models that can be reused for many applications; for example, a single model could be used for designing many types of batteries. Broadly, this will be useful in many areas of chemistry and materials science, and the Montemore group will release user-friendly code and models to allow other research groups to effectively leverage the framework. Dr. Montemore is also advising the local chapter of the Society for Hispanic Engineers, and working with Louisiana Dow Chemical to provide workshops and mentorship for the membership. Additionally, Dr. Montemore is advising a number of undergraduate researchers, including several from underrepresented groups. The new machine learning architecture that Dr. Montemore and his group will develop here is designed based on chemical principles, such as the existence of elements as discrete entities (and not as part of a continuous space). Briefly, the architecture uses latent (i.e., intermediate) variables to partially decouple different guests and different host elements, which greatly simplifies the learning task for each submodel. This is a significant departure from most screening approaches, which use off-the-shelf ML models to map continuous features onto a target variable and must learn chemical principles during fitting. The architecture is well-suited to handling heterogenous data sets, such as a mixture of computational and experimental data. It also does not require very large data sets, in contrast to deep-learning approaches. The models are often interpretable, as predictions can be explained in terms of latent variables or host features. Finally, the architecture is well-suited to transfer learning, which uses a pre-trained model to accelerate the creation of new models. In summary, this architecture harnesses fundamental chemical principles, especially those present in applications involving guest-host chemical bonding, to significantly increase the efficiency of materials screening. Overall, the primary benefits of this approach are significantly increased speedup in materials screening by allowing reusability, and the possibility of more sophisticated, effective screening by predicting multiple quantities. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →
CDS&E: A Machine Learning Architecture for General, Reusable Models for Guest-Host Chemical Bonding · GrantIndex