CAREER: Mining Hints from Text Documents to Guide Automated Database Performance Tuning

$480,837FY2023CSENSF

Cornell University, Ithaca NY

Investigators

Abstract

Database management systems; that is, systems that process and manage large data sets, are used widely, across virtually all sectors of industry. Their performance depends on a variety of tuning decisions, determining how the system processes data internally. For lay users, it is very hard to find settings that optimize performance. This has motivated the creation of automated database tuning tools that try to find optimal settings for them. However, crucial information for database tuning is often available in the form of natural language text, including, for instance, the database manual, text documents describing data sets, as well as discussions on database-centric Internet forums. Currently, automated tools are unable to benefit from such text, making them inefficient. This project aims at creating automated database tuning tools that extract useful information for tuning from a variety of text documents. By increasing the quality of automated tuning tools, the project empowers lay users and reduces the need for highly specialized workers in industry, currently causing staff shortages and hampering the adoption of new technology. At the same time, the project aims at the creation of new teaching offerings, helping to educate the next generation of data professionals. The project is divided into two primary research thrusts, dedicated to the two categories of text documents that are most useful for database system tuning: text about data sets and text about database management systems. Transformer-based language models will be used to extract relevant information from such text documents. The resulting insights can be used in multiple ways for database tuning: to guide data profiling operations prior to tuning, to refine cost models used for tuning, or to restrict the search space of tuning choices. The project will explore all of those options, combining insights gained from text with other sources of information (e.g., trial runs that result in performance measurements for specific tuning choices). The project will consider a representative set of classical database tuning problems, including, for instance, the problem of selecting auxiliary index data structures to optimally support data processing, as well as the problem of finding optimal values for database system configuration parameters. All project outcomes will be integrated into a software package for automated database tuning, using text documents as input. This software will be released to the public. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →