Surface Shape Based Screening of Large Protein Databases

$295,821R01FY2007GMNIH

Purdue University, West Lafayette IN

Investigators

Linked publications & trials

Abstract

DESCRIPTION (provided by applicant): In the structural genomics era, there is a need to extract and represent 3D shapes of protein binding sites, in order to reuse the information in a robust and simple manner. The long term objective of this proposal is the development of a set of computational algorithms and a database for using local surface shape signatures of proteins to predict function of proteins, and for protein-protein docking prediction. Storing precalculated binding sites in a database allows fast screening and comparison. Algorithms for identifying, representing, comparing, clustering, and docking local surface shape signatures of proteins will be developed. To identify binding sites, a visibility based algorithm will be used, which can identify both cavity and protrusion regions. To represent identified binding sites, three hierarchical levels of representation are proposed. The simplest representation scheme uses feature points which capture global or local maximum/minimum mean curvatures, the radius, and depth of a binding site. The second level of the representation uses a histogram-based method, capturing relative distances between feature points of a binding site. The last representation employs a voxelization method. The database of binding sites will employ R-tree based multidimensional indexes to allow real-time screening and clustering of binding sites. It is also planned to use a Self-Organizing Map for clustering, which allows dynamic updating of clusters. Pre-calculated hierarchical clusters by the R-tree or a Self-Organizing Map provide a framework for dynamic clustering displayed by a zoomable user interface. The fast geometric hashing-based protein-protein docking algorithm is developed, which uses precalculated binding sites to reduce the search space. A new invariant basis will be used in the hashing step to reduce the complexity of the algorithm from O(n3) to O(n2). A novel non-uniform hashing table is used in the hashing, which is tolerant to small errors or changes of parameters. The methodology will be extended to be able to handle predicted structures with possible errors. The active site identification methods will be applied to predict function of protein structures of unknown function determined by structural genomics projects. The docking algorithm will be applied to protein-protein interaction data of E.coli.

View original record on NIH RePORTER →