Improvements And Extensions To The Blast Algorithms
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
The BLAST family of protein and DNA database search programs constitute one of the key services offered by the NCBI. These programs are currently run on NCBI servers about 70,000 times during a typical weekday. This project represents an ongoing effort to improve and extend the functionality of these programs. Efforts this year have focussed primarily on the use of compositionally adjusted amino acid substitution matrices with "Blastp", the program for comparing protein queries to protein databases. Preliminary results indicate that such score matrices are useful primarily for the comparison of sequences with similar lengths (less than a factor of two difference), and for sequences whose compositions are relatively similar, as measured by relative entropy. Algorithmically, we have needed to rewrite the heuristic for producing gapped alignments when the substitution matrix used is modified. This is due to the fact that the rescaling procedure employed for compositionally adjusted statistics essentially never produced longer alignments, whereas the new compositionally adjusted matrices frequently yield extended alignments.
View original record on NIH RePORTER →