Scalable Homology Search Tools

$474,957FY2002BIONSF

University Of California-Santa Barbara, Santa Barbara CA

Investigators

Abstract

Biologists are interested in having the ability to deal with large amounts of sequence data. One of the common questions asked is to detect regions of high similarity, under the assumption that similarity in sequence will reflect not only similar function and structure but also a level of relatedness. When comparing very large sequences, it is necessary to have wide bandwidth for data that is distributed or access the data many times, if the data can be managed locally. An index structure of the data rather than the data itself can be stored in much less space and accessed much more rapidly. The new structure uses a measure that corresponds to the differences or number of changes necessary to change one sequence into another. It is called a Multi-Resolution String Index Structure. It has been implemented in a prototype. The expanded effort will result in scalable searching tools that will bring in a number of students for the proposed effort in the basic data structure research. A number of biological scientists are involved in using the software. The results of the work will be made available on the Internet.

View original record on NSF Award Search →