A UNIFIED MULTITASK ARCHITECTURE FOR PREDICTING LOCAL PROTEIN PROPERTIES

$21,369P41FY2011RRNIH

University Of Washington, Seattle WA

Investigators

Linked publications & trials

Paper 38394332 Paper 36195974 Paper 33663385 Paper 33576802 Paper 32433965 Paper 32049966 Paper 31384652 Paper 30573453 Paper 30192228 Paper 29549327 Paper 29258831 Paper 29212878 Paper 29129916 Paper 29054547 Paper 28746874 Paper 28733371 Paper 28609436 Paper 28057866 Paper 27909058 Paper 27854364 Paper 27819013 Paper 27648579 Paper 27600774 Paper 27284789 Paper 27212659 Paper 26988418 Paper 26919432 Paper 26751078 Paper 26669440 Paper 26658470 Paper 26618866 Paper 26538025 Paper 26460945 Paper 26317499 Paper 26286455 Paper 26275773 Paper 26179046 Paper 26140597 Paper 26068544 Paper 25987414 Paper 25971801 Paper 25869802 Paper 25749450 Paper 25635455 Paper 25625208 Paper 25620407 Paper 25588614 Paper 25533783 Paper 25533207 Paper 25324391 Paper 25319827 Paper 25306102 Paper 25301780 Paper 25299455 Paper 25298752 Paper 25281774 Paper 25261218 Paper 25245948 Paper 25167058 Paper 25132062 Paper 25075907 Paper 25017061 Paper 24983411 Paper 24967700 Paper 24954902 Paper 24874881 Paper 24829381 Paper 24813883 Paper 24798088 Paper 24797264 Paper 24794442 Paper 24791985 Paper 24724171 Paper 24702058 Paper 24648496 Paper 24577120 Paper 24467693 Paper 24461736 Paper 24417624 Paper 24360279 Paper 24360276 Paper 24339787 Paper 24293517 Paper 24277933 Paper 24149845 Paper 24039885 Paper 24025714 Paper 24013502 Paper 24001182 Paper 23993091 Paper 23963813 Paper 23898186 Paper 23828149 Paper 23818590 Paper 23746445 Paper 23713831 Paper 23633951 Paper 23604254 Paper 23583182 Paper 23579499

Abstract

This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. Primary support for the subproject and the subproject's principal investigator may have been provided by other sources, including other NIH sources. The Total Cost listed for the subproject likely represents the estimated amount of Center infrastructure utilized by the subproject, not direct funding provided by the NCRR grant to the subproject or subproject staff. A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.

View original record on NIH RePORTER →