DATA MINING MICROARRAY GENE EXPRESSION DATE FOR PHYSIOLOGICAL DISCOVERY
Louisiana State Univ A&M Col Baton Rouge, Baton Rouge LA
Investigators
Linked publications & trials
Abstract
This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. One of the challenges facing biology in general, and consequently multidisciplinary research in computational biology, is the assignment of biochemical and cellular functions to the thousands of hitherto uncharacterized gene products discovered by several international gene-sequencing projects. Data mining offers the promise of precise, objective, and accurate in-silico analysis of high-dimensional data using knowledge discovery routines that reveal embedded patterns, trends, and anomalies in order to create models for faster and more accurate physiological discovery. Feature fusion methods are capable of effectively integrating information from multiple data sources for reinforced learning and accurate prediction and analysis. In this research, we have developed feature extraction, fusion, and classification algorithms for bioinformatics data mining challenges. We present a new method (BiEntropy) that applies information entropy and closed frequent pattern mining to identify co-expressed gene patterns that are relevant across a subset of conditions. Our goal is to discover different forms of local patterns (constant, additive, and overlapping) in gene expression data. To demonstrate our method's gains, we implement our algorithm using two novel discretization schemes to discover biological enriched biclusters. We apply our method on both synthetic and real data to demonstrate the effect of the discretization schemes and to show method's effectiveness in extracting artificially embedded and biologically enriched biclusters. We have also developed a unique spectral coherence based feature integration method that preserves sequence-order properties and yields considerable gains in supervised classification. In another study, we have developed a method for classification that uses vector quantization based feature extraction and association rule based isomorphic discovery and fusion approach.
View original record on NIH RePORTER →