GGrantIndex
← Search

Statistics of Sequence Comparison

$156,641ZIAFY2009LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Abstract

Work this year focused on an development of an improved scoring systems for local multiple alignment, informed by the minimum description length (MDL) principle. Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. Substitution scores for local pairwise alignment are implicitly of log-odds form, comparing the probabilities of aligning two letters under models of relatedness and non-relatedness, and the best pairwise substitution scores are explicitly so constructed. We have developed ideas, based on the MDL principle, for extending this formalism to multiple alignments. Most simply, Bayesian methods can be used to derive "BILD" substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We have developed a method to calculate BILD scores efficiently, and have employed it in Gibbs sampling optimization procedures. We have shown that BILD scores yield improved performance in detecting related sequences and constructing biologically accurate alignments. A manuscript describing this work is near completion.

View original record on NIH RePORTER →