Graph-Based Data Mining

$452,487FY2001CSENSF

University Of Texas At Arlington, Arlington TX

Investigators

Diane J Cookcontact Lawrence B Holder Sharma Chakravarthy

Abstract

With the increasing amount and complexity of today's data, there is an urgent need to accelerate the development of knowledge discovery and concept learning methods for mining large databases. Furthermore, much of this data is structural in nature, or is composed of entities and relationships between those entities. Hence, there exists a need to develop scalable methods for discovering new knowledge in structural databases. The main objective of this project is to investigate and implement new methods for performing knowledge discovery and concept learning on structural databases represented as graphs. This work builds upon existing methods for graph-based knowledge discovery implemented in the Subdue structural discovery system. The graph-based discovery algorithm is extended to perform structural concept learning and structural, hierarchical conceptual clustering. To achieve greater scalability, database management techniques are integrated into the graph-based discovery and learning processes. One targeted application is the use of Subdue as the core of a structural Web seach engine. Domain experts provide guidance and feedback on applications to molecular biology, geology, telecommunications, and software engineering. Achievement of the above objectives impacts the ability to automatically extract useful knowledge from the ever-increasing amount of data. By disseminating the Subdue discovery algorithm, databases, and discovered results over the Internet, scientists in all areas benefit from similar analyses of their own databases. Through integration of our research ideas into classes taught at UTA and into student research, this project impacts education at UTA and at other universities.

View original record on NSF Award Search →