III: Small: Collaborative Research: Analysis of Multi-Dimensional Protein Design Spaces with Pareto Optimization of Experimental Designs
Purdue University, West Lafayette IN
Investigators
Abstract
In developing variants of natural proteins with improved properties and activities, protein engineers are confronted with large, complex design spaces. The degrees of freedom for producing variants mirror nature but can be specifically targeted experimentally, choosing parent proteins, replacements for some amino acids (site-directed mutation), and locations for crossing over between parents (site-directed recombination). A set of choices, constituting a design, can be evaluated by multiple disparate criteria, including consistency with evolutionary information, energetic favorability with respect to a three-dimensional structure, and incorporation of specific characteristics distinguishing functional subclasses. Unfortunately, the different evaluation metrics may be complementary or even contradictory, and the prior information on which they are based is incomplete, so that the metrics are only more or less accurate in predicting the real-life quality of the designs. The overall goal of this project is to develop efficient methods to characterize complex protein design spaces and optimize high-quality designs for experimental evaluation. A combinatorial protein engineering approach will be pursued, experimentally constructing a library of related variants and assaying them for properties of interest. Potential scores will evaluate a possible library (without explicitly enumerating its members) with respect to prior information from sequence, structure, and functional subclass. To account for disparate evaluation metrics, design algorithms will focus on the identification of Pareto optimal designs, those for which no other design is as good or better with respect to all desired criteria. To account for incomplete prior information, design algorithms will trade off between exploitation of the prior information and broader exploration of the design space, seeking to identify a diverse set of designs, each with a diverse set of variants. Markov Chain Monte Carlo sampling algorithms will characterize the overall design space by generating choices for the degrees of freedom and evaluating the designs with the potential scores, using the scores and diversity metrics to appropriately explore the space. Exact algorithms will more precisely focus on regions of interest, dividing and conquering the design space and employing combinatorial optimization algorithms to identify Pareto optimal designs. The design space approach provides a powerful new mechanism to address protein engineering applications, enabling the engineer to explicitly evaluate and optimize for trade-offs among important criteria and considerations. Interactive tools will help engineers navigate through the regions of interest, visualize designs and perform "what-if" analyses, and compare and contrast Pareto optimal designs. A design space repository will enable sharing of analyses and underlying data. The tools and repository will support protein engineering for a range of activities in the national interest, including biosensors, production of novel biological therapeutics and novel enzymes for green chemical synthesis, energy extraction, and bioremediation. As part of the project, the mechanism will be put to use in the engineering of soluble and robust cytochrome P450s that employ the inexpensive and non-toxic hydrogen peroxide to hydroxylate steroids and multi-ring compounds that mimic estrogenic (feminizing) steroids in the environment without the need for living cells or protein cofactors. Such enzymes would be valuable as tools for chemical synthesis, waste treatment, and bioremediation. This project provides an ideal venue to impart cross-disciplinary training to students by illustrating how computational techniques can be fruitfully integrated with experimentation in answering important biological questions. Aspects of the project will be used in both undergraduate and graduate courses, from an introductory biology course to an advanced bioinformatics course. The project itself will provide the opportunity for inter-disciplinary research training for graduates and undergraduates, including those from underrepresented groups.
View original record on NSF Award Search →