AitF: Collaborative Research: Fast, Accurate, and Practical: Adaptive Sublinear Algorithms for Scalable Visualization

$208,678FY2019CSENSF

University Of California-Berkeley, Berkeley CA

Investigators

Abstract

With the wealth of data being generated in every sphere of human endeavor, data exploration--analyzing, understanding, and extracting value from data--has become absolutely vital. Data visualization is by far the most common data exploration mechanism, used by novice and expert data analysts alike. Yet data visualization on increasingly larger datasets remains difficult: even simple visualizations of a large dataset can be slow and non-interactive, while visualizations of a sampled fraction of a dataset can mislead an analyst. The project aims to develop FastViz, a scalable visualization engine, that will not only enable visualization on datasets that are orders of magnitude larger in the same time, but also ensure the resulting visualizations satisfy key properties essential for correct analysis by end-users. To ensure immediate utilization, FastViz will be applied to three real-world application domains: battery science, advertising analysis, and genomic data analysis, and implemented in Zenvisage, an open-source visual exploration platform developed by the PIs. Students in the project gain invaluable experience in combining the algorithmic and systems considerations that enable data exploration. FastViz's development is driven by simultaneous investigation of systems considerations, such as indexing and storage techniques that enable various forms of online sampling, and algorithmic considerations for (a) visualization generation, where the goal is to produce incrementally improving visualizations in which the important features are displayed first, and (b) visualization selection, where the goal is to select, from a collection of as yet not generated visualizations, those that satisfy desired criteria. On the systems front, FastViz will leverage and contribute back to recent developments on online sampling systems that enable the use of more powerful sampling modalities. On the algorithms front, FastViz will draw ideas from testing, distribution learning, and sublinear algorithms literature that, to the best knowledge of the PIs, have not been adapted in practice. The algorithms developed will obey optimality guarantees, and wherever possible, instance-optimality guarantees, ensuring that they will adapt to data characteristics in the most efficient way possible. The project will lead to a better understanding of the interplay between sampling algorithms development and systems design, facilitating the adoption of more realistic models and algorithms on the one hand, and the development of more powerful sampling engines that enable the models required within the algorithms.

View original record on NSF Award Search →