Development of a machine learning pipeline for assisting strain design of nonmodel yeasts
Washington University, Saint Louis MO
Investigators
Abstract
Some yeast strains produce relatively large amounts of oils and fats. These oils and fats have similar structures to those for many pharmaceuticals and chemicals. These strains are often also capable of high growth rates and can be grown using waste plant biomass. These yeasts can also be genetically engineered, so they could be a platform for low-cost biomanufacturing processes. Before that can be achieved, additional details of their metabolic mechanisms must be uncovered. The overall objective of this project is to develop a model that can predict the yield of product from biomanufacturing processes using genetically modified yeasts. This project also includes summer research programs for high school and undergraduate students. One point of emphasis will involve a partnership with Lincoln University, an HBCU. This partnership will help to develop a diverse workforce well-trained in key aspects of the emerging bioeconomy: artificial intelligence, bioinformatics, and synthetic biology. Synthetic biology tools can engineer microbes to produce many target products. Multiple Design-Build-Test-Learn (DBTL) cycles are required to resolve bottlenecks that resulted from both pathway engineering and stressed cultivation conditions. However, the effectiveness of DBTL often drops after initial cycles and strain development may fall into “involutions” without further technology breakthroughs. To overcome this hurdle, the Washington University/RPI team will work with Sandia National Lab and Pacific Northwest National Lab to develop an AI-enhanced biomanufacturing route. Firstly, they will perform knowledge mining of oleaginous yeast literature and the ABF data repository. The extracted information (including strain engineering, fermentation conditions, and production metrics) will be converted into a structured database. The database will be used to train machine learning (ML) models to predict productivity from engineered constructs under various bioreactor conditions. Then, the team will integrate ML and computational strain design models to guide yeast strain development for biofuels (e.g., butanol) and natural products (e.g., flavonoid) synthesis. Based on model predictions, the RPI team and the national labs will perform novel enzyme engineering, promoter tuning, and CRISPRi pathway modification to improve yeast fermentation titers. The experimental tests will validate and further improve model applicability. The objectives of this project are four fold: (1) Develop rules that extract and standardize biomanufacturing information from different sources; (2) Advance AI technology (such as meta-learning and ensemble learning) for the prediction of non-model yeast fermentation outcomes; (3) Integrate genome scale modeling, computational strain design, and ML to improve strain design under complex bioreactor conditions, to minimize DBTL cycles, and to reduce the cost for experimental trials; and (4) Engineer three non-model yeast species for bioproduction with sustainable feedstock. The successes in this project could greatly facilitate the translation of laboratory strains into industrial producers. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →