Three-dimensional Structures Of Biological Macromolecules
National Heart, Lung, And Blood Institute
Investigators
Linked publications & trials
Abstract
Protein structure prediction via deep learning of protein folding Protein structure prediction (PSP) has long been a central problem in biochemistry, driven by the dogma that sequence determines structure and structure determines function. Modern PSP systems generally comprise four components: (i) an input module (Section Inputs) that takes a single protein sequence to generate additional input features, almost always including a multiple sequence alignment (MSA) of homologous proteins, (ii) a trunk (Section Trunks), typically a neural network capable of sophisticated pattern recognition, which transforms features from the input module to spatial information that partially encode the 3D structure, (iii) an output module (Section Outputs) that converts this spatial information into an initial 3D structure, sometimes without explicit side-chain atoms, and (iv) a refinement module (Section Refinement) that improves the initial structure and produces all atomic coordinates. Traditionally, these modules relied on a mixture of physics-based energy functions, knowledge-based statistical reasoning, and heuristic algorithms. The last few years however have witnessed an infusion of machine learning, particularly neural networks, into every aspect of PSP. What started as a trickle of progress accelerated over the subsequent decade and, last year, reached a crescendo with DeepMind's AlphaFold2 14, a system that arguably solves single apo domain PSP. Currently ML for PSP use binary contact map (BCM) or discretized inter-residue distances as output. All information comes from existing structures. To improve ML for PSP, increase information source will be beneficial. We believe protein folding pathway will provide abundant information for PSP. Therefore, we employ ML to recognize the folding movement at every stage of folding pathway to produce the movement of proteins. We utilize the Nudged elastic band simulation to produce pathway from the extended state to the folded state. The movement of protein at each conformation are studied with ML. This work is still in progress and hopefully lead to more accurate PSP.
View original record on NIH RePORTER →