GGrantIndex
← Search

EFFICIENT CHARACTERIZATION OF ALU INSERTION SITES VIA MODEL-BASED SEARCH TREE

$14,580P20FY2009RRNIH

Louisiana State Univ A&M Col Baton Rouge, Baton Rouge LA

Investigators

Linked publications & trials

Abstract

This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Alu elements are primate-specific short interspersed elements (SINEs). Although Alus have no known biological function, numerous studies have shown that the insertions and mutations of Alus have contributed a great deal to a significant proportion of human genetic diseases. With the development of computational biology, biologists identified the preferred, immediate consensus sequences at the site of Alu insertion by simply aligning less than tens of sequences near the insertion site. However, it still remains open, for thousands of DNA sequences, how the broader-scale of features, not only in the 5'flanking region but also in the 3'region, that may also influence the possibilities of Alu insertion, especially with respect to different Alu subfamilies. Recent progresses in data mining research and exponential growth of biological data in the past decade allow us to acquire this information from a new perspective. In this report, we present our on-ongoing efforts to develop the first specialized data mining framework that can determine the larger-scale features of the DNA sequences on the order of hundreds or thousands base pairs that facilitate or adversely affect Alu insertions. The proposed framework is based on one of the most sophisticated frequent pattern classification algorithms - Model Based Search Tree (MBST), which is recently co-developed by the PI. We apply the proposed algorithm on eight DNA datasets, with each containing thousands of Pre-Alu-Insertion (PAI) and Non-Pre-Alu-Insertion (NPAI) sequences. The results demonstrate that our method can not only accurately identify PAIs with over 86% precision and recall, but also capture the sequential patterns of hundreds of bps in both flanking regions which comprise the consensus sequences reported before. The discoveries presented in this report, as well as the research conducted under way, may provide biological insights that can enhance our understanding of the insertion mechanism of Alu Elements, and our ability to predict which sites are more prone to damages by mobile element insertions.

View original record on NIH RePORTER →