AF: SMALL: Relational Algorithms

$258,780FY2022CSENSF

University Of Pittsburgh, Pittsburgh PA

Investigators

Abstract

The use of relational databases, which store data in relational form in multiple tables, is a ubiquitous technology among American businesses and organizations. The four most commonly used databases (Oracle, MySQL, Microsoft SQL Server, PostgreSQL) are all relational. The use of machine learning to glean competitive insights from their data is also a ubiquitous technology among American businesses and organizations. Thus machine-learning tasks on relational data are ubiquitous. However, essentially all standard machine-learning algorithms require the input be in the form of a single table, and thus these algorithms can not operate directly on data hat is stored in relational form. Further, it is far from obvious if and how one can adapt many of these machine-learning algorithms to efficiently operate on data in relational form. Thus the standard practice is to first join together multiple tables inside the relational database into a single larger table, and then export this table to a standard machine-learning package. Joining tables is a time-consuming operation, requiring exponential time and memory in the worst case. The goal of this project is to develop relational algorithms, which are efficient algorithms that work directly on the relational tables and thus avoid expensive join operations, for common machine-learning problems. More efficient algorithms for standard machine learning problems on relational data would allow American companies and organizations to more efficiently glean competitive insights from their relational data. There are four goals for this project. The first goal is to develop relational algorithms for basic geometric problems that underlie many common machine-learning problems. These algorithms can then be used as the building blocks to develop relational algorithms for more complicated machine-learning problems. The second goal is to develop relational algorithms for standard machine-learning problems. Plan A is to design a relational implementation of the standard algorithm, plan B is to design an alternate relational algorithm with the same performance guarantee as the standard algorithm, and plan C is to design a relational algorithm with an alternate reasonable performance guarantee. The third goal is to develop foundational algorithmic-design and -analysis techniques. The fourth goal is to understand the limitations of relational algorithms as there may well be problems where relational algorithms are not possible. The investigators will determine which of the myriad algorithmic tools in the standard algorithmic toolkit are of utility in developing relational algorithms. In some cases the application of a known technique will require some novelty. In some cases no existing algorithmic tool will do the job, and the investigators will invent new algorithmic-design and -analysis techniques. As is the norm in algorithmic foundations research, the recognition that a particular algorithm-design or -analysis technique is of wide applicability, and the understanding of the limitations of these techniques occurs naturally during the process of the design and analysis of algorithms for specific problems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →