Statistics Of Sequence Comparison
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
Work this year included the publication of a new measure of[unreadable] sequence similarity that unifies a traditional measure of[unreadable] alignment similarity with a new measure of compositional[unreadable] similarity.[unreadable] [unreadable] In brief, protein sequence database search programs may be[unreadable] evaluated both for their retrieval accuracy - the ability to[unreadable] separate meaningful from chance similarities - and for the[unreadable] accuracy of their statistical assessments of reported alignments.[unreadable] However, methods for improving statistical accuracy can degrade[unreadable] retrieval accuracy by discarding compositional evidence of[unreadable] sequence relatedness. This evidence may be preserved by[unreadable] combining essentially independent measures of alignment and[unreadable] compositional similarity into a unified measure of sequence[unreadable] similarity. We have studied two measures of compositional[unreadable] similarity, and found that one, when combined with alignment[unreadable] similarity, improves the statistical accuracy of blastp, as[unreadable] well as its retrieval accuracy measured using a SCOP-based[unreadable] test set.[unreadable] [unreadable] Further work this year focussed on developing scoring systems[unreadable] for recognizing correlated positions in multiple sequence[unreadable] alignments. We have built on other published work to confirm[unreadable] that in alignments of simulated sequences with correlated[unreadable] mutations, a form of normalized mutual information (nmi)[unreadable] appears to be the most effective measure. We have studied[unreadable] the mean and standard deviation of nmi for uncorrelated[unreadable] multiple alignment columns as a function of sequence number[unreadable] N and average per position relative entropy h.
View original record on NIH RePORTER →