Elements:Software:NSCI: Empowering Data-driven Discovery with a Provenance Collection, Management, and Analysis Software Infrastructure
Texas Tech University, Lubbock TX
Investigators
Abstract
Scientific breakthroughs are increasingly powered by advanced computing and data analysis capabilities delivered by high performance computing (HPC) systems. In the meantime, many scientific problems have moved to a level of complexity that the ability of understanding the results, auditing how a result is generated, and reproducing the important experiments or simulation results, is critical to scientists. Enabling such a capability in HPC systems requires a holistic collection, management, and analysis software infrastructure for "provenance" data, the metadata that describes the history of a piece of data. Such a software infrastructure does not exist yet, which motivates the proposed software development of a lightweight provenance service. With such a software element, many advanced data management functionalities such as identifying the data sources, parameters, or assumptions behind a given result, auditing data history and usage, or understanding the detailed process that how different input data are transformed into outputs can be possible. Responding to the National Strategic Computing Initiative, this project will provide an attractive software infrastructure to future national HPC systems to improve the productivity of science in complex HPC simulation and analysis cycles. The project team will also recruit underrepresented students, mentor graduate and undergraduate students, integrate results into curriculum, and publish and disseminate results. The lightweight provenance service software on HPC systems will provide: 1) an always-on, background service that automatically and transparently collects and manages provenance for scientific applications, 2) captures comprehensive provenance with accurate causality to support a wide range of use cases, and 3) provides easy-to-use analysis tools for scientists to quickly explore and utilize the provenance. This project will integrate the development, education, and outreach efforts tightly together. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →