SGER: Discovery of Research Trends Using Concept Extraction and Data Mining Techniques in domain-specific Text: Application to Nanoscale Science and Engineering Field.
George Washington University, Washington DC
Investigators
Abstract
National Science Foundation - Division of Chemical &Transport Systems Particulate & Multiphase Processes Program (1415) Proposal Number: 0737961 Principal Investigator: Bellaachia, A. Affiliation: George Washington University Proposal Title: SGER: Discovery of Research Trends Using Concept Extraction and Data Mining Techniques in domain-specific Text: Application to Nanoscale Science and Engineering Field The purpose of this project is to conduct exploratory research, employing knowledge data discovery techniques that will advance the state-of-the art for extracting, analyzing, understanding, and digesting information about a complex research area from large semi-structured data. When searching for documents that contain the data or topics that one is looking for, the search consists of little more than a keyword matching. In the past this technique has been successful, due to the number of documents that could possibly be returned. However, now that there are trillions of possible documents that fit simple keyword searches, a more sufficient methodology needs to be developed. Concept extraction could be a possible solution to this growing problem. Concept extraction is the process of examining a document programmatically and determining its subject or key ideas. This research will use concept extraction and apply data mining techniques to analyze the online NSF awards with a focus on the nano-scale science and engineering awards. Noun phrases will be extracted from award proposals using existing tools such as General Architecture for Text Engineering (GATE) [10]. The list of noun phrases will be used to describe the content of each award. Two main issues will be addressed in this project (1) the discovery of topics and research trends, and (2) the classification of data according to these topics. The evaluation of our system will be conducted using the online NSF awards and will target the nanoscale scientific and engineering awards. Intellectual Merit This research addresses problems and opportunities presented by the increasingly complex large semi-structured data available in business, science, and a range of other domains. Current searching techniques do not provide intuitive mechanisms to navigate through different topics in the dataset. The most significant contribution from our previous effort was the establishment of database that stores all nanoscale science and engineering awards with different functionalities. The research objectives of this project are to implement a data mining tool that detects emergent research trends in the area of nanoscale science and engineering fields. This tool will use algorithms that adequate to this type of domain of applications. Broader Impact The broader implications of the proposed work are also many as well as significant. Our research can be applied to other domains such bio-informatics. The following broader implications of our research relate to the educational process: - The proposed activity will provide support for graduate and/or undergraduate students. - The findings of this research will be disseminated broadly via conferences and/or journal publications as well as lectures and seminars as opportunities arise. - Finally, the methodology followed in this research will be shared with the students of my data mining class.
View original record on NSF Award Search →