GGrantIndex
← Search

BCSP: ABI Innovation: Collaborative Research: Predicting changes in protein activity from changes in sequence by identifying the underlying Biophysical Conditional Random Field

$411,434FY2014BIONSF

Ohio State University, The, Columbus OH

Investigators

Abstract

Proteins are the molecular machines that are responsible for a vast array of functions that are necessary for life. Understanding how they work is critical to both a better scientific understanding of the fundamental processes of life, and to modifying or improving their function. Despite the fact that proteins are physically 3-dimensional structures of cooperating parts, the current state of the art for representing and studying proteins uses a description that is simply a sequential list of the parts used in their assembly. This sequential-list style of description has biased the development of tools for protein analysis to accentuate the sequential properties of these molecules, and to ignore the fact that the parts must work together in unison for the protein to function. This project will adapt a recently-developed statistical technique, the Conditional Random Field (CRF), that can quantitatively represent densely-connected networks of features, and a recently-developed visualization tool that enables interactive exploration of these networks, for the task of describing proteins. Structurally, Conditional Random Fields appear to recapitulate the process by which evolution has selected for parts that cooperate in proteins, and protein descriptions based on CRFs will be able to predict whether a change to a protein - a mutation - would have been tolerated by evolution, or selected against as non-functional. This information will aid in predicting the effect of a mutation, or multiple mutations to a protein, using much more of the available information, than is currently utilized by state-of-the-art tools. This work will broadly impact the study of proteins, improving a range of activities from basic scientific studies of function, to endeavors in protein engineering. In addition, the "change in protein sequence to change in protein function" problem is a "model organism" for many other types of biological and non-biological systems where rich interactions between parts of the system demand a sophisticated statistical approach. To-date, in most of these fields, models that are similarly limited to those currently used in proteins are the de-facto standard. Developing the tools necessary for applying CRFs to protein data, and methods of establishing testable ground-truth in this system, will enhance the application of CRFs to many other domains where they may provide a significant advantage over current methods. The products of this project will be made freely available to the research community as online tools, and the methods will be incorporated in coursework, first in the Biophysics Graduate Program at The Ohio State University, and as the teachable component matures, made available as lesson-plan material appropriate for both primary and secondary education. By developing a tool that makes interdependencies between features visually explorable and modifications of these dependencies quantifiably predictable, we will promote more thorough consideration of the true complexity of data and systems in many domains.

View original record on NSF Award Search →