SHF: Small: MIGS -- Efficiently Evaluating Multiple Iterative Graph Queries

$500,000FY2020CSENSF

University Of California-Riverside, Riverside CA

Investigators

Abstract

Since graphs can readily represent entities and relationships among them, they are widely used to represent large volumes of data from domains ranging from social networks to biological and brain networks. Mining of large graphs for developing insights from data often takes the form of evaluating iterative graph analytics queries that originate at different points in the graph and compute global property values that are potentially dependent upon the structure of the entire graph. Such algorithms are both compute- and data-intensive; optimizing them is a challenge due to the large size and highly irregular structure of real-world graphs. While many systems for evaluating iterative graph queries have been developed, their focus is on minimizing the execution time of a single query though in practice many queries need to be evaluated. The goal of this research is to dramatically improve evaluation times of a group of queries by synergistically evaluating them to exploit the observation that evaluating individual queries involves graph traversals and computations that greatly overlap with each other. This research helps discover novel techniques for identifying and exploiting such overlap to reduce redundant costs and overheads associated with parallel executions on single-machine and multiple-machine platforms. Building such powerful systems accelerates new discoveries in fields that employ graph analytics. In addition, broader impact also results from training students in an area of national need. Specifically, this research develops both system level optimizations and algorithmic innovations that work together to greatly increase the scalability of simultaneously evaluating multiple iterative queries. To guarantee versatility of techniques produced, the new optimizations and algorithms are being realized both in the context of a shared-memory multicore machine as well as a distributed cluster of multiple machines. Versatility is also achieved by supporting different kinds of queries (point-to-point and point-to-all) and different forms of graphs (fixed graphs and streaming or dynamic graphs). Each query type and graph type pose unique challenges to developing algorithms that are both correct and achieve high-performance. To exploit the overlap of work performed during evaluation of different queries, reuse-based algorithms are being developed while to update the results of queries in response to changes in the graph, incremental algorithms are being devised. These approaches are aimed at enabling high-performance and high system utilization on a shared-memory machine. To further scale performance to large batches of queries, distributed algorithms are being developed to utilize the resources from multiple machines in a distributed cluster. The evaluations utilize large graph data sets from public repositories such as KONNECT and SNAP. The software developed is made available to other researchers over the course of this project. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →