GGrantIndex
← Search

III: Small: COMPASS: Online Sketch-based Query Optimization for In-Memory Databases

$499,870FY2020CSENSF

University Of California - Merced, Merced CA

Investigators

Abstract

The query optimizer is a core component of database servers, which represent one of the most successful products of the software industry, adopted massively both across business enterprises and in scientific projects ranging from astronomy to genomics. Despite this success and decades of work, query optimization is still far from solved. The main reasons are the complexity of the problem and the fast pace of hardware development, which makes query optimization a continuously moving target. In this project, the researchers investigate how to design COMPASS, a lightweight, yet effective, query optimizer for modern databases based on two design principles. The first principle is to capitalize on highly-parallel computing architectures in query optimization, while the second is to simplify the type and number of synopses included in the optimizer. The final goal is to build COMPASS, an open-source query optimizer that can be integrated into existing and novel database servers. Due to the extensive use of databases across many domains of modern life, optimal querying can bring benefits to the entire society. COMPASS is an online query optimizer that uses sketch synopses exclusively in order to find optimal execution plans. Sketches are correlated synopses for cardinality estimation that use small space, can be computed efficiently in a single scan over the data, are linearly composable, and have statistically high accuracy. COMPASS uses the parallel execution engine in modern databases to compute sketches at runtime. This is realized by decomposing query processing into two stages, performed before and after optimization. In the first execution stage, selection predicates are pushed-down and sketches are built only over the relevant tuples. Plan enumeration is performed over the join graph by incrementally composing two-way join sketches in order to estimate the cardinality of multi-way joins. The plan is executed in the second processing stage. The holistic COMPASS approach introduces novel methods in all the components of the query optimizer---cardinality estimation for selections, two-way, and multi-way joins; plan enumeration; and cost models. In addition to the algorithmic aspects, these methods involve heavy engineering practices on highly-parallel architectures. Specifically, parallel random number generation schemes go well beyond sketches due to their application to many other data processing tasks. This is also applicable to graph traversal algorithms. The generalization of sketches to multi-way join estimation has intellectual value by itself because this is a theoretical open problem. Since sketches are streaming algorithms at origin, the contributions made in this project are also directly applicable to this area. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →