III: Medium: Collaborative Research: Reasoning about Optimizers for Data-Intensive Systems
University Of California-Berkeley, Berkeley CA
Investigators
Abstract
Today, we witness an explosion of new data-intensive systems, both for traditional data processing and for machine learning, and these systems critically require powerful optimizers for their declarative languages. Developing and verifying such optimizers is very difficult: in the past, they were developed only by a small number of commercial database vendors with dedicated teams, while most modern systems are developed by small teams without such expertise. To address this challenge, this project studies and implements automated methods for verifying query optimization rules in data-intensive systems. Specifically, this project will have four research focuses: (1) We will develop an axiomatic foundation based on algebraic identities in a semiring, implement it as a framework, and apply it to verify optimization rules in existing systems. (2) We will extend the framework to reason about languages that combine linear algebra with relational algebra, for example languages that manipulate vectors, matrices, and tensors, and extend the verifier to reason about tensor optimization rules. (3) We will conduct a theoretical study of the completeness and decidability of the semiring-based axiomatic system used for verifying optimization rules, and specialize this study for various query language fragments. (4) We will build a new cloud-based infrastructure for automated reasoning of declarative query languages, to enable researchers to easily develop executable semantics for different data-intensive systems, formal methods researchers to develop new techniques targeted for query language reasoning, and application developers to build new applications that make use of our infrastructure. All software artifacts developed in this project will be released to the public, with plans to incorporate their usage in both the undergraduate and graduate curricula. Any collected benchmarks from open source will be aggregated into a repository that is publicly accessible, with the goal to enable researchers and practitioners in the field to experiment and reproduce the results. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →