GGrantIndex
← Search

Improvements And Extensions To The Blast Algorithms

$0Z01FY2006LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Abstract

The BLAST family of protein and DNA database search programs[unreadable] constitute one of the key services offered by the NCBI. These[unreadable] programs are currently run on NCBI servers about 200,000 times[unreadable] during a typical weekday. This project represents an ongoing[unreadable] effort to improve and extend the functionality of these programs.[unreadable] Improvements this year have centered on the blastp and tblastn[unreadable] programs.[unreadable] [unreadable] The blastp program was modified to allow it to use compositionally[unreadable] adjusted scoring matrices as an alternative to the compositional[unreadable] scaling that has been available for five years. This permits the[unreadable] substitution matrix used to score alignments to be adjusted so[unreadable] that it is consistent with the compositions of the sequences being[unreadable] compared. A study we published this year shows that compositional[unreadable] matrix adjustment is recommended only under certain conditions, so[unreadable] it may be invoked either universally or conditionally. A further[unreadable] study has shown that the use of neither compositional scaling nor[unreadable] compositional adjustment yields very unreliable statistics, so[unreadable] compositional scaling has been adopted as the default behavior[unreadable] for blastp. After further experience, we may change the default[unreadable] behavior to conditional compositional matrix adjustment.[unreadable] [unreadable] The program tblastn has been modified so that its substitution[unreadable] matrix may be modified by either compositional scaling or conditional[unreadable] compositional matrix adjustment. Because the query is a DNA sequence[unreadable] that is conceptually is translated in six frames, at least five of[unreadable] which are usually incorrect, matrix modification requires the[unreadable] definition of a sequence window from which to calculate sequence[unreadable] composition. We have constructed a way to define such a window[unreadable] that yields good empirical results. Our studies have shown that[unreadable] either type of substitution matrix modification yields statistics[unreadable] that are much more accurate than those of the baseline program,[unreadable] with only a minor attendant decrease in retrieval accuracy.

View original record on NIH RePORTER →