GGrantIndex
← Search

Data Science Guided Organic Reaction Development

$523,685R35FY2025GMNIH

Utah State Higher Education System--University Of Utah, Salt Lake City UT

Investigators

Linked publications & trials

Abstract

Project Summary The overarching objective of this program is two-fold; 1) to develop, implement, and refine data science and machine learning (ML) workflows to guide and streamline reaction development and 2) to democratize the use of open access data science tools by the broader chemical community. While the use of ML in chemical research has significantly increased over the past few years, such procedures and tools have not seen widespread implementation. With this proposal renewal, we aim to increase the ease of access to data science workflows that use ML to enhance reaction development in the context of organo- and bio-catalysis research. A major pillar of this work is to produce statistical models that are both predictive and interpretive. The impacts of predictive models are highly practical for synthetic chemistry including the optimization of reaction metrics for both target oriented campaigns and for general reaction development. The power of interpretive models is they not only yield fundamental insight into origins of chemical reactivity but they also require a chemist to be in the loop thereby providing the foundation for hypothesis driven design and development. Success in using data science to build comprehensive models is predicated on high quality data with diverse distribution in outputs (yield, selectivity, etc.). Throughout the proposed research, we will focus on the generation of suitable data by 1) building descriptor libraries prior to initiating an experimental campaign, 2) using dimensionality reduction to construct chemical space maps, and 3) using algorithms to cluster and select molecules for evaluation. For organocatalytic processes, we plan to develop workflows to explore a range of reactions including enantioselective access to chiral bioisosteres of amide and sulfonamides merged with new chiral phosphoric acid catalyst design and development. This integrated approach will allow for interrogation of structural and electronic features responsible for catalyst performance. We will also address the use of ML in biocatalysis by enhancing workflows to circumvent and intervene in directed evolution campaigns that are stymied in terms of plateaued mutant performance. This will be accomplished with algorithmic tools to increase data distribution and through the development of molecular features to adequately represent the complex nature and interactions between enzyme and substrate. The tools we are developing are aimed to be highly adaptable and will be applied to modelling how drug molecules partition in biomolecular condensates. Overall, we will work with a disparate group of collaborators to achieve these broad goals and make the tools publicly available to enable their use by the broader community.

View original record on NIH RePORTER →