Improvements and Extensions to the BLAST Algorithms
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
Work this year focused on the improvement of PSI-BLAST through the development of new methods for estimating the effective number of independent observations represented in an alignment column, and for calculating the number of pseudocounts that should be employed in constructing PSI-BLAST substitution scores. In brief, PSI-BLAST estimates the probabilities of amino acids occurring in an alignment position by combining N "effective" observed amino acid counts with n data-dependent pseudocounts. Because of sequence correlations, the number N of independent observations represented by an alignment column is not simply the number of sequences aligned. We have described a logically improved method for estimating N, and have found that its implementation yields improved PSI-BLAST retrieval accuracy. Also, we have developed a method, inspired by the minimum description length (MDL) principle, for adjusting the number of pseudocounts n, as a function of column composition. As suggested by both theory and experiment, n should be larger for more variable positions. These improvements are both now implemented in PSI-BLAST, and used by default.
View original record on NIH RePORTER →