III: Small: Task-aware Materialization for Fast Data Analytics
University Of Wisconsin-Madison, Madison WI
Investigators
Abstract
The data-to-knowledge pipeline is central to our data-driven world. It consumes data in raw format and then cleans, transforms, integrates, stores, processes, and analyzes the data to obtain knowledge in a usable format. Data analytics pipelines are complex and their workflows typically consist of several simpler tasks chained together. To speed up such pipelines, a ubiquitous optimization technique is to materialize the intermediate result of a task, so that downstream tasks can access the intermediate data as efficiently as possible. Existing materialization techniques suffer from several drawbacks, including prohibitively large cost in terms of storage and preprocessing time. To address these drawbacks, this proposal will develop smart materialization algorithms that can significantly accelerate the performance of data analytics applications. As a result, it will enable data scientists to obtain actionable insights faster and will impact research in areas such as biology, economics, sociology and the medical sciences. The goal of this project is to design, implement and evaluate materialization techniques using three novel ideas: task-aware materialization; fine-grained decisions on what and how to materialize; and multiple design points that can trade off space for time to achieve optimal performance of data analytics pipelines. The project will also explore how to design data structures for materialization that are adaptive to changes in the data and the downstream workload. From a theoretical viewpoint, the proposed research will aim to obtain theoretical guarantees on the tradeoff between space and time for materialization for different tasks. From a practical viewpoint, it will result in implementing and evaluating the developed algorithms on real-world data analytics applications, including visualization through lineage tracking, statistical inference, pattern retrieval in graphs and social network analysis. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →