Collaborative Research: A Data-driven Closed-loop Framework for De Novo Generation of Molecules with Targeted Properties
University Of Missouri-Columbia, Columbia MO
Investigators
Abstract
Professors Jian Lin and Shih-Kang Chao of University of Missouri-Columbia and Olexandr Isayev of Carnegie Mellon University are supported by an award from the Chemical Theory, Models and Computational Methods (CTMC) program in the Division of Chemistry. They will develop and apply a novel data-driven architecture for designing novel molecules with desired physical and chemical properties. The project combines generative modeling, reinforcement learning and active learning algorithms to afford a general methodology to solve a long-lasting scientific challenge of property-objected inverse molecular design. The methodology will improve understanding of molecular representations, provide a new route to exploring novel chemical space inaccessible by simple optimization of existing molecules, and provide understanding on how the generative model learns chemical principles. The designed novel molecules with multiple optimized properties, e.g. physicochemical, electronic, optical, redox properties, will transform a variety of applications in medicine, photovoltaics, catalysis, thermal storage, and organic redox flow batteries. In addition, the interdisciplinary nature of this project will offer the research experience in chemistry, materials science, statistics, and computer science to involved undergraduate and graduate students. The project will also promote diversity in the STEM fields and future workforce by increasing females in STEM disciplines as well as improving STEM education in K12 school via outreach programs. Professors Lin, Chao, and Isayev will demonstrate a data-driven closed-loop framework for de novo generation of novel molecules with desired physicochemical properties in the extreme range. The proposed research is motivated by three main challenges inherited in molecule generation: (i) generation of novel molecules with targeted and quantifiable properties; (ii) generation of molecules meeting multiple property objectives; (iii) generated molecules having targeted properties beyond the range in the training dataset. To tackle these challenges, this collaborative team will develop an integrated data-driven methodology that combines a reinforced learning and conditional generative adversarial network to design novel molecules with targeted multiple properties. The research team will combine the pipeline with active learning to enable an iterative close-loop molecular development process, which will accelerate scientific progress in molecular discovery. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →