GGrantIndex
← Search

Efficient Algorithms for Motif Search

$355,196R01FY2011LMNIH

University Of Connecticut Storrs, Storrs-Mansfield CT

Investigators

Linked publications & trials

Abstract

DESCRIPTION (provided by applicant): Multiple genome projects have generated large volumes of DNA, RNA and protein sequence data. While computational pattern searching techniques such as BLAST (a program for measuring sequence similarities) have enabled major discoveries such as the modularity of proteins, much more information can be gained from biological sequence data. However, due to the length and complexity of the patterns, we are limited by the computational algorithms used for motif discovery. Simple Motif Search (SMS), Planted Motif Search (PMS) and Edit-distance Motif Search (EMS) are the three principal paradigms that have been previously used for identifying short functional peptide motifs, transcriptional regulatory elements, composite regulatory patterns, DNA motifs, similarity between families of proteins, etc. Our group has been instrumental in developing algorithms for these problems. Existing pattern-search algorithms for motif search have two major shortcomings: 1) Approximate algorithms do not always identify the correct pattern, but have the advantage that they can be used to look for short and relatively large patterns in large data sets such as genomes. 2) In contrast, an exact algorithm always identifies the correct pattern, but cannot be used to identify complex data patterns in large datasets. To extract more sophisticated patterns from genomic data we need exact algorithms that can be used to analyze genomes for complex patterns with reasonable computational resources. Exact algorithms are currently limited because the run times of these algorithms are exponentially dependent on the parameters involved. For example, the currently best known algorithms for PMS and EMS (on a PC) are expected to take more than a month for patterns of length 27 and more than 5.67 years for patterns of length 31. In this project, we propose to develop the next generation SMS, PMS and EMS algorithms that identify more complex patterns in larger datasets using less computation time and memory. We also propose to develop a web based system that will incorporate PMS and EMS algorithms. Open source versions of all the algorithms and data developed will be made available to users. In addition, the web system will support online processing of queries that involve the solution of PMS and EMS.

View original record on NIH RePORTER →