Improvements and Extensions to the BLAST Algorithms
National Library Of Medicine
Investigators
Linked publications, trials & patents
Abstract
A problem that occasionally arises in PSI-BLAST searches is the "corruption" of the evolving sequence profile through the inclusion of non-homologous sequences in the PSI-BLAST multiple alignment. In previous years, corruption has been greatly reduced through the improvement of PSI-BLAST statistics, most importantly by accounting for non-standard sequence composition. Recently, however, it has been observed in the literature that PSI-BLAST profiles may become corrupted through "homologous over-extension", a problem that can not be remedied by improved statistics. In brief, this problem arises when the boundaries of an otherwise "true" alignment are miscalculated, yielding the alignment longer than it should be. If such an alignment extends into a domain in the subject sequence that occurs widely in the database, subsequent PSI-BLAST iterations can, in a ratchet-like manner, come to include the complete domain, even though it does not exist in the query sequence. The problem is due not to faulty statistics, but to faulty alignment. One solution to this problem has been proposed in the literature, but we have adopted what we consider a better remedy, based upon trimming alignments at each end by a certain number of bits. This year we have continued to test and refine our approach. By standard pooled ROC-n measures, we have achieved results better than the baseline PSI-BLAST program. However, analysis suggest that further improvement is possible with an approach that analyzes multiple PSI-BLAST hits simultaneously. Development of this method continues. No publications have yet resulted.
View original record on NIH RePORTER →