D3SC: Identification of Products and Pathways in Organic Reactions

$500,000FY2020MPSNSF

University Of California-Irvine, Irvine CA

Investigators

Abstract

In this project, funded by the Chemical Structure, Dynamics & Mechanisms-B Program of the Chemistry Division, Professors David Van Vranken of the Department of Chemistry and Pierre Baldi of the Department of Computer Science at the University of California, Irvine are training a computer system called Reaction Predictor using deep learning to identify plausible stepwise reaction pathways based on the structure of the reactants. The goal of this research is to develop a computational tool that rapidly searches highly branched reaction pathways to identify the chemical structures of the products of organic reactions. Identifying the products of organic reactions would accelerate pharmaceutical synthesis, drug stability studies, and other advanced manufacturing requiring synthetic organic chemistry. The project lies at the interface of mechanistic organic chemistry, chemoinformatics, and machine learning. The training data is based on common depictions of organic reactions understandable by everyone from high school chemistry students to academic researchers. The team is well-positioned to provide education and training for students underrepresented in science. Sophomore organic chemistry students will be engaged in benchmarking the progress in the ongoing evolution of machine vs. human prediction as applied to organic chemistry reactions. The first stage in predicting polar chemical reactivity is the identification of nucleophilic electron source atoms and electrophilic electron sink atoms. Chemists currently lack a database of nucleophilicity and electrophilicity parameters that covers the full span of nucleophilic functional groups, from carbon-carbon bonds to alkyl anions, and the full span of electrophilic functional groups, from carbon-carbon sigma antibonding orbitals to the empty orbitals of a cyanide cation. To fill this void in knowledge, methyl ion affinities are being calculated in a way that correlates them with solution phase reactivity parameters and those correlation values will be used to train Reaction Predictor in source and sink scoring. Dynamically adjusted thresholds will be evaluated to accommodate highly reactive or unreactive source or sinks. The current training set of tens of thousands of elementary reaction steps will be greatly expanded. In addition, Reaction Predictor will be trained to rank source-sink pairs occurring in any given set of reactants. Reaction Predictor will use these rankings to identify products and multistep reaction pathways for reactions carried out in the laboratory. In addition, the Reaction Predictor will be made available on-line, enabling a robust user community to participate in artificial intelligence as applied to organic chemistry. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →