Accurate Modeling in Structural Genomics

$335,294R01FY2015GMNIH

Stanford University, Stanford CA

Investigators

Linked publications & trials

Abstract

DESCRIPTION (provided by applicant): Complexes of proteins and nucleic acids are the macromolecular machines at the heart of modern structural biology. We believe that structures of large macromolecular complexes can be solved with less experimental data and at higher throughput. Structures are still solved using methods invented decades ago and model- dependency causes severe problems for large systems solved at low-resolution. Preliminary studies done during the previous funding period show that a way forward is to build a very large number of different models and then test these models directly against the experimental data. This approach has allowed us to assign sequence to a known backbone using much less data than is the norm. Preliminary results show that with suitable built-in statistical controls, this unbiased approach works well for both low-resolution X-ray data as well as mass spectrometry with a small number of experimental cross-links. Our approach is innovative and it determined the detailed atomic structure of chaperonin CCT/TRiC, a 950 kilodalton, 8-gene quasi- degenerate system that could not be solved by conventional methods of cryo-EM or X-ray crystallography. Driven by the central hypothesis that unbiased methods solve structures with less information and at higher throughput, we have 3 specific aims: 1. Facilitate structure determination by cross-linking and mass spectrometry (XL+MS). With optimized protocols, XL+MS will be applied to the PIC, RIG-I and RdRp systems studied by colleagues at Stanford. 2. Determine and refine spatial-arrangement of macromolecular domains and subunits with cryo- electron microscopy (cryo-EM). After calibrating methods on open form chaperonin CCT, they will be applied to the systems above to simultaneously fit both mass spec and cryo-EM data. 3. Position side chains with R-value exploration of low-resolution X-ray data. All-atom combinatorial homology models will be generated using best practices consistent with the need to generate millions of models. The fit of calculated model X-ray data and that observed (the R-value) will be optimized in an attempt to assign amino acids not seen in low-resolution structures to backbone C-alpha positions. Given the central role of structural biology in medical science, our work if successful, could produce useful structures at higher throughput. With its strong reliance on computational resources, which continue to drop exponentially in cost, these results would be obtained with fewer resources and in less time. Our work would also advance detailed functional and biological studies that are hampered by lack of confidence in side chain positions. Positive impact could be broader in that other problems in structural and systems biology could benefit from the key principles of our approach, namely: eliminate bias by examining millions of possible models that are all equivalent and built to the same consistent specifications. This set of structures then provides a statistical sanity check, showing how much better the best model is than the next best one.

View original record on NIH RePORTER →