III: Small: Automatic Learning-based Services for Distributed Data Management Systems

$499,913FY2018CSENSF

Brandeis University, Waltham MA

Investigators

Abstract

The distributed nature of most database management systems (DBMSs) brings about new challenges to the daily tasks of database administrators. Apart from database tuning, today's administrators are often responsible for: (a) adapting data partitioning and replication schemes; (b) managing dynamic workloads to meet query prioritization and performance goals; as well as (c) provisioning computing resources on demand. Unfortunately, the complexity of these tasks often exceeds engineers' abilities to scientifically model them. To address this challenge, this project will couple existing learning-driven theory with distributed data management systems to have significant impact on the design of DBMSs. By leveraging advanced learning algorithms from machine learning and game theory, the project will allow DBMSs to move away from "hard-coded algorithmic intelligence," rigid data structures, and algorithms that are based on informal intuition. Instead, DBMSs will be able to incorporate "learning-based intelligence" that provides reasoning on numerous decisions based on pattern recognition capabilities and mathematically proven insights. Leveraging information to learn and adapt could unleash tremendous potential as databases evolve to systems capable of automatically tuning themselves. The results of this project will also reduce the human effort of database administrators as they will be offered predictive models, decision tools and insight to the interplay between data distribution, workload management, and query performance. The project will provide solutions to some of the key technical challenges that arise when tuning the performance of dynamic workloads on distributed data management systems. This research will lead to the design of new algorithms and learning-based frameworks for supporting data distribution, replication, query dispatching, query scheduling, and performance prediction without human intervention. These techniques will be automatically customized to application-level performance goals and user-defined query priorities and will allow distributed DBMSs to naturally handle dynamic workloads, changing data access patterns, and varying resource availability. By coupling unsupervised/supervised learning, deep learning and game theory to data management tasks the project will transform data management systems to "database science" tools through which system administrators will be able to explore and derive insight on the factors that affect the performance of a database and its deployed applications. As a result, the project will deliver complex predictive and correlation models between query, data, resource-related features and system performance. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →