CRII: III: Scalable Noise-filtering and Community Queries on User-generated Data
University Of California-Riverside, Riverside CA
Investigators
Abstract
This project investigates novel indexing and querying techniques that enable scientists to analyze and extract meaningful data from large repositories of user-generated data. The need for such techniques is significant, especially for user-generated social media data, which is the major repository of user-generated data and the largest archived and real-time source of human behavior and information. Thus, scientists are widely using this data in disciplines as disparate as sociology, behavioral sciences, education, spatial sciences, food sciences, medical studies, and political sciences. This project focuses on innovative indexing and querying techniques to enable scientists to effectively exploit user-generated data at a large scale. The planned research adds new data management infrastructure modules to support: (1) Scalable noise filtering queries, a subset of selection queries that are needed repeatedly and are expressed in SQL-based systems as multiple nested queries, which is not efficient for large datasets. To support this, the project investigates techniques for: (a) Advanced query conjunctions, e.g., BUT-NOT and EITHER-XOR, to scale up complex-predicate queries beyond basic search queries that are currently supported in data management systems. (b) Scalable contextual scoring of data records, e.g., based on sentiment or semantics, so irrelevant records are pruned early and the search space is downsized significantly. (2) Scalable community-centric queries that enables scientists to ask queries about communities with large numbers of users while having real-time query response beyond what is currently supported by graph data management technology. The project investigates indexing, query processing, and storage optimization techniques that scale up such queries at a system-level. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →