Improvements And Extensions To The Blast Algorithms
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
The BLAST family of protein and DNA database search programs constitute one of the key services offered by the NCBI. These programs are currently run on NCBI servers about 200,000 times during a typical weekday. This project represents an ongoing effort to improve and extend the functionality of these programs. Efforts this year have focussed primarily on adding new scoring systems and statistics to existing programs. First, we have added compositional statistics (matrix scaling) to the tblastn program. This greatly improves the accuracy of reported E-values. This involved a fair amount of experimentation with different ways of defining estimating the "amino acid composition" of conceptually translated DNA sequences. Second, we have added compositional substitution matrix adjustment to the blastp and tblastn programs. This permits the substitution matrix to be adjusted so that it is consistent with the compositions of the sequences being compared. For most related sequence pairs, this improves both the bit score and alignment quality. The result is improved sensitivity of general-purpose database searches. Compositional adjustment has been added as an option within NCBI's publicly available BLAST code. It may be invoked universally, or conditioned upon the relative lengths and compositions of the sequences being compared.
View original record on NIH RePORTER →