Statistics Of Sequence Comparison

$0Z01FY2002LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Paper 18586708 Paper 17068079 Paper 15509610 Paper 14663142 Paper 11139604

Abstract

This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Work this year includes: a) Investigation of the statistics of a block-based scoring system - within certain parameter ranges, the distribution of optimal block-based scores was found to be reasonably well modelled by an extreme value distribution. Whether such scores are more sensitive in recognizing distant biological relationships than protein "profiles" or position-specific score matrices remains to be determined; b) Initial investigation of the statistics of the "hybrid" local alignment scoring system - this method was found to produce scores that follow an extreme value distribution with predictible scale parameter lambda.

View original record on NIH RePORTER →