GGrantIndex
← Search

SIFTER: A Systems Biology Platform for Protein Function Prediction

$240,000FY2011CSENSF

Talwalkar Ameet S, Oakland CA

Investigators

Abstract

Proteins are key biomolecules involved in virtually all processes within cells, e.g., metabolism, cell signaling, immune response, etc., and knowledge of protein function is vital to obtain a basic understanding of cellular activity. Due to recent advances in nucleotide sequencing technology, the number of available genomic sequences is doubling in size roughly every 12 months, an incredibly fast pace vastly exceeding Moore's law. Experimental technologies required to decipher protein function have not progressed nearly as fast. In fact, although there are roughly 10 million protein sequences in the comprehensive Uniprot database, only 0.2% have experimentally validated function annotations. This sequence-function gap is rapidly expanding, and the development of computational methods is of crucial importance to effectively utilize this deluge of sequence data. In this work, we develop SIFTER, a large-scale, systems biology platform to accurately predict protein function from high-throughput data. Building upon a promising phylogenomic-based prototype, we incorporate interaction networks into our model to improve performance. Interaction data intrinsically couples the thousands to millions of proteins within such networks, and we use variational inference and parallelized implementations to address this challenging computational problem. We also explore techniques for function prediction based on low-rank matrix factorization, and along the way, introduce novel sampling-based approaches to speed up computation. Additionally, we develop algorithms to quantify uncertainty in SIFTER's predictions to help guide future experimental work. These novel algorithms are large-scale extensions to classical bootstrap sampling and are generally applicable to any problem involving massive data. Finally, we evaluate SIFTER in collaboration with experimental biologists, allowing us to pinpoint relevant use cases and resulting in an effective method with widespread impact within the biomedical community.

View original record on NSF Award Search →