Machine Learning Models for Interpreting Molecular Structure from Vacuum Ultraviolet Spectra
University Of Georgia Research Foundation Inc, Athens GA
Investigators
Abstract
With support from the Chemical Measurement and Imaging (CMI) Program in the Division of Chemistry, Brandon Rotavera and Geoff Smith at the University of Georgia are developing new machine learning tools to facilitate identification of the structure of molecules from their gas phase spectroscopy. The machine-learning models target >95% accuracy (based on validation experiments using models with known structure), to provide confidence in predicting critical details of molecular structure – particularly for elusive molecules that are important in chemical science and related engineering applications. This project is expected to have broader scientific impact by contributing new data-informed modeling tools that provide predictive capabilities to support innovative methods for the identification of molecules that are important to photochemistry, chemical kinetics, chemical physics, combustion processes, and atmospheric chemistry. The project will provide research opportunities for graduate and undergraduate students, including veterans. Data-enabled computational science such as machine learning (ML) offers critical insights for ongoing development of sustainable energy technologies, which rely extensively on understanding fundamental chemical mechanisms of elusive radicals that are central to next-generation biofuel combustion. Success of this effort is predicated on the ability to identify multi-functional intermediates, including substituted cyclic ethers, organic hydroperoxides, and other complex species. Isomer-resolved vacuum ultraviolet (VUV) spectroscopy is a cutting-edge tool to detect such species via differential absorption coupled with mass spectrometry. This project leverages such measurements to develop new data-enabled ML tools to advance analysis and interpretation of molecular structure. Resulting insights will facilitate detection and recognition of chemical species relevant to tropospheric chemistry, combustion chemistry, and other areas. Specifically, the Rotavera/Smith team is working to convert elements of previously unassigned VUV absorption spectra to specific isomers and/or stereoisomers. Resulting chemical insights may allow one to link isomers to specific reaction pathways on potential energy surfaces that, as an example, underpin numerical combustion models needed to accelerate the design of sustainable hybrid combustion systems. For this project, the principal investigators are using several promising ML methods to identify functional groups and other molecular motifs: (1) deep neural networks, (2) boosted decision trees and (3) support vector machines (SVMs). Such methods will be particularly useful for identifying functional groups in molecules for which authentic standards are not available commercially and which are difficult or impossible to synthesize. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →