GGrantIndex
← Search

Improvements And Extensions To The Blast Algorithms

$0Z01FY2005LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Abstract

The BLAST family of protein and DNA database search programs constitute one of the key services offered by the NCBI. These programs are currently run on NCBI servers about 200,000 times during a typical weekday. This project represents an ongoing effort to improve and extend the functionality of these programs. Efforts this year have focussed primarily on adding new scoring systems and statistics to existing programs. First, we have added compositional statistics (matrix scaling) to the tblastn program. This greatly improves the accuracy of reported E-values. This involved a fair amount of experimentation with different ways of defining estimating the "amino acid composition" of conceptually translated DNA sequences. Second, we have added compositional substitution matrix adjustment to the blastp and tblastn programs. This permits the substitution matrix to be adjusted so that it is consistent with the compositions of the sequences being compared. For most related sequence pairs, this improves both the bit score and alignment quality. The result is improved sensitivity of general-purpose database searches. Compositional adjustment has been added as an option within NCBI's publicly available BLAST code. It may be invoked universally, or conditioned upon the relative lengths and compositions of the sequences being compared.

View original record on NIH RePORTER →