GGrantIndex
← Search

CAREER: A Peer-to-Peer Framework for Decentralized Resource Administration and Management in Grid Computing

$461,197FY2003CSENSF

Purdue University, West Lafayette IN

Investigators

Abstract

Data mining is the process of automatically extracting useful information hidden in large data sets. This emerging discipline is becoming increasingly important as advances in data collection have lead to the explosive growth in the amount of available data. This project aims to develop a wide-range of novel data mining algorithms suitable for the characteristics of scientific data sets arising in genomics and fluid dynamics. Our research will focus on developing algorithms both for sequential datasets and for datasets that can be represented by directed labeled graphs. The graph-based modeling enables us to capture in a single and unified framework many of the spatial, topological, geometric, and other types of relational characteristics present in scientific datasets. The specific research tasks that we plan to address are: (i) Development of scalable algorithms for finding frequently occurring patterns in graph data sets and algorithms for finding patterns whose frequency decreases as a function of the pattern-length. (ii) Development of scalable and high quality clustering algorithms for sequence and graph data sets which operate directly in the native feature space. (iii) Development of scalable and accurate classification algorithms based on automated sequential or relational feature extraction approaches. These algorithms will be validated by analyzing data sets arising in genomic and turbulent fluid flow. The project integrates the data mining research with an educational plan that focuses on initiating undergraduate and graduate students to the various computational and data analysis aspects of genomic research and developing a comprehensive bioinformatics curriculum whose goal is to foster multi-disciplinary research and collaboration. In addition, a comprehensive set of software tools will be developed and made available that can be used both to train students in using data mining techniques and to conduct novel research expanding the levels of understanding in various scientific disciplines.

View original record on NSF Award Search →