III: Small: Regret-Bounded Query Evaluation via Reinforcement Learning

$499,999FY2019CSENSF

Cornell University, Ithaca NY

Investigators

Abstract

Database systems try to shield their users from the intricacies of data processing. Users specify queries at a high level of abstraction, describing desired results rather than the way in which they are generated. Hence, given a query, database systems need to find efficient data processing plans automatically. Those plans are generated based on simplifying assumptions about queries and data. All too often, those assumptions turn out to be overly simplistic and generated plans do not finish within reasonable amounts of processing time. In such cases, efficient processing plans must be generated manually, based on expert knowledge. This creates high overheads for organizations which employ database experts. Laymen users who have no access to such expertise may be unable to execute certain queries altogether. The proposed project will overcome those challenges by a novel approach to plan generation. This approach makes no simplifying assumptions and therefore guarantees near-optimal plans. The proposed work will explore the potential for "intra-query learning," a new approach combining query execution and plan generation. Intra-query learning divides the execution of a single query into many micro-episodes in which different plans are tried. Each episode serves two purposes. First, it generates query result fragments that will be collected to form complete query results. Second, it yields information on the quality of plan alternatives. This information will be leveraged to select better plans for the remaining episodes. The project will use methods from the area of reinforcement learning to select plans in each episode. Those methods offer formal guarantees on making near-optimal decisions under uncertainty. This project will translate such guarantees into guarantees on near-optimal expected processing cost. The research outcomes will be integrated into SkinnerDB, a novel database system designed from the ground up for robust performance without manual interventions. SkinnerDB will entirely abandon tools such as coarse-grained data statistics, or simplifying cost and cardinality models, that are traditionally used to select query plans. Instead, it will rely exclusively on reinforcement learning in combination with an execution engine that is tailored to the needs of intra-query learning. This will enable it to learn near-optimal plans from scratch even for queries that execute on freshly loaded data or contain newly introduced user-defined functions. Starting from a first approach showing promising performance, the project will explore various extensions such as parallel and distributed processing, query plan compilation, and disk-based data processing. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →