CAREER: Behavior-Driven Testing of Big Data Exploration Tools

$606,541FY2022CSENSF

University Of Washington, Seattle WA

Investigators

Abstract

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2). Companies, governments, and institutions all over the world use massive datasets to make many decisions that impact our daily lives, such as deciding which investments to prioritize to maximize a company’s growth. However, big data is only valuable when it can provide useful insights, which analysts often seek to extract from data visualizations. Much like mastering a new recipe, it takes effort and skill to process data and design effective visualizations, and analysts are increasingly turning to computational tools to support their efforts to visualize massive data and process it into a form suitable for consumption. However, evaluating these tools is challenging, both because they are used for a wide variety of problems and by a wide variety of people, and standard tool benchmarks are unequipped to handle these variations. The vision for this project is to develop better ways to evaluate tools as they are used in the wild; if we can automate the way we evaluate data exploration tools, then we can automatically test new tools as soon as they are created, tune them to real workloads, and help analysts be more efficient and effective at generating insights. To this end, the research team will develop automated testing software that can determine: (1) whether a data exploration tool is capable of helping someone achieve the particular goals they have in exploring their data; and (2) what problems these tools and evaluation methods may introduce to the data exploration process. The team will also work with leading visualization researchers and software companies to fine-tune the software and maximize its impact, and develop new programs to help students learn fundamental visualization and research skills. To make the envisioned software feasible, the research objective of this project is to formally specify an analyst’s goals in exploring a dataset, and to measure whether a given system helps or hinders an analyst’s ability to achieve these specified data exploration goals. During data exploration, analysts visually and interactively query their data to help their organization make informed decisions. The research will be conducted in three phases. Phase 1 will theoretically and programmatically define a person’s exploration intent at different granularities (e.g., goals, sub-tasks, interaction patterns). Phase 2 will integrate foundational theory in human-computer interaction and artificial intelligence path planning methods to generate a valid sequence of user interactions that achieve a programmatically defined intent. Phase 3 will extend the models from Phase 2 to develop customizable performance testing software that can simulate how people alternate between goal-directed interactions (i.e., following a planned sequence) and open-ended interactions (i.e., exploring alternative analyses). The research team will implement the framework as an open source platform so others can use the findings to evaluate their own systems and data exploration use cases. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →