Algorithms for protein superfamily classification and function prediction

$244,046R00FY2011RRNIH

University Of Alabama At Birmingham, Birmingham AL

Investigators

Linked publications & trials

Abstract

The goal of this project is to develop new algorithms for protein function prediction. Recent rapid advancements in various technological developments produce biological data of unprecedented amount and complexity. Computational methods are becoming essential components in modern biomedical research. One of greatest challenges facing bioinformatician is the discovery of connections among different data sets and generating novel biological knowledge or hypotheses. Predicting the molecular function of novel proteins is ah urgent task for the post-genomics era. Especially, recent assessment of structural genomic efforts revealed a gap between experimental protein structure determination and the use ofthe structural knowledge for gaining understanding of biological function of the proteins at the molecular level. We will employ recent developments in discriminative machine learning approaches for constructing a residue-level classification system for function prediction from structure. Existing systems for functional prediction from structure either use global structural and sequence similarities over entire protein chain or use localized similarities such as putative functional sites. Our system will leverage the information from both global and local similarities, and identifies important residues and clusters of residues that are distinctive among different functional families. Our approach is based on and extend over an efficient optimization framework that we developed for protein superfamily classification. We expect that these methodological developments will not only improve the performance of state-of-the-art function prediction, but also help illuminating our understanding ofthe interplay of sequence and structure on defining functional variations among protein families. Beyond this major project, we will work on an additional project that extends the graph theoretical models for multiple sequence alignment we developed earlier to meet the challenge of domain annotation for large new sequence set.

View original record on NIH RePORTER →