Statistics Of Sequence Comparison
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
This project is a continuing study of questions concerning what[unreadable] similarities can be expected to occur purely by chance when two[unreadable] protein or DNA sequences are compared. A subsidiary and related[unreadable] question concerns the definition of scoring systems that are optimal[unreadable] for distinguishing biologically meaningful patterns from chance[unreadable] similarities. Work this year has focussed on the definition of[unreadable] a new measure of sequence similarity that unifies a traditional[unreadable] measure of alignment similarity with a new measure of compositional[unreadable] similarity.[unreadable] [unreadable] In brief, protein sequence database search programs may be[unreadable] evaluated both for their retrieval accuracy - the ability to[unreadable] separate meaningful from chance similarities - and for the[unreadable] accuracy of their statistical assessments of reported alignments.[unreadable] However, methods for improving statistical accuracy can degrade[unreadable] retrieval accuracy by discarding compositional evidence of[unreadable] sequence relatedness. This evidence may be preserved by[unreadable] combining essentially independent measures of alignment and[unreadable] compositional similarity into a unified measure of sequence[unreadable] similarity. We have studied two measures of compositional[unreadable] similarity, and found that one, when combined with alignment[unreadable] similarity, improves the statistical accuracy of blastp, as[unreadable] well as its retrieval accuracy measured using a SCOP-based[unreadable] test set.
View original record on NIH RePORTER →