Comparison of Protein Sequences and Structures

$333,081R01FY2006LMNIH

University Of Virginia, Charlottesville VA

Investigators

Linked publications & trials

Paper 28902397 Paper 27010337 Paper 25976240 Paper 24509512 Paper 23995390 Paper 23749753 Paper 22539666 Paper 22058127 Paper 20693322 Paper 20307279 Paper 20064877 Paper 19948773 Paper 15919194 Paper 14751975 Paper 12427470 Paper 12424122 Paper 12096113 Paper 10980156

Abstract

[unreadable] DESCRIPTION (provided by applicant): The long-term goals of our research are: (a) to develop more sensitive and reliable methods for exploiting sequence and structure information through similarity searching; and (b) to understand better the biophysical constraints on protein folding that can be identified from protein sequence information. Although similarity searching is now routinely used to characterize sequences and annotate genomes, the most widely used methods focus on speed at the expense of sensitivity and statistical accuracy. We believe that more flexible algorithms, with more accurate statistical estimates, can provide new biological insights about the structure, function, and evolutionary history of protein and DMA sequences. Over the next five years, our specific aims are: (1) To improve the FASTA programs by: providing better performance on parallel (Beowulf) clusters; using vector-parallel instruction sets, and providing more accurate statistics. (2) To develop evolutionary calibrated DMA sequence comparison algorithms using rapid initial seeding, followed by extension using context dependent scoring matrices. The goal is to develop heuristic approaches with well understood evolutionary horizons. (3) To develop improved strategies for identifying repeated sequences in proteins by combining optimal local alignment strategies with appropriate scoring matrices and gap penalties, (4) To develop accurate statistical estimates for profile: sequence and profile: profile similarity searches. Profile: profile comparison programs with accurate statistical estimates should substantially reduce the sensitivity gap between sequence and structure comparison. Profile: profile comparisons will both be far more useful, and allow us to explore fundamental questions about how easy it is for new protein families to emerge. (5) We will examine local sequence constraints in proteins, using each family as an independent observation. We believe that much of the literature on the global properties of protein sequences fails to distinguish between correlations that reflect genuine biophysical constraints, and correlations that reflect shared evolutionary history. We will also search for clear examples of convergent evolution-similar functions carried out by clearly non-homologous proteins. Accurate statistical estimates for searches with real protein sequences, and profiles from real protein families, can change fundamentally the inference of homology from statistically significant similarity. Because of inaccurate statistical estimates, similarity searching is often considered a tool for generating hypotheses about homology, which must be confirmed experimentally. When the statistical estimates are highly accurate, it may become possible to define homology in terms of statistically significant similarity. [unreadable] [unreadable] [unreadable]

View original record on NIH RePORTER →