EAGER: SciDatBench: Principles and Prototypes of Science Data Benchmarks
Indiana University, Bloomington IN
Investigators
Abstract
Analysis of large scientific data sets requires new research in both the data analysis methods and the information technology hardware and software to use in the analysis. This project is investigating and prototyping a new set of science data benchmarks, dubbed SciDatBench. It establishes a new collection of important and representative big scientific datasets together with typical software implementations of the machine learning algorithms that are needed for best practice analysis. The SciDatBench collection is accompanied by documentation allowing it to be used in the training of researchers in the rapidly evolving Big Data analysis techniques. The project has a potential to impact a broad range of scientific disciplines including eventually material sciences, environmental sciences, life sciences including epidemiology, fusion, particle physics, astronomy, earthquake, and earth sciences, with more than one representative problem from each of these domains. SciDatBench generates particular instances of big data analysis benchmarks and establishes a sustainable process for maintaining and enhancing them. This collection includes both standalone examples and end-to-end examples needing multiple components that are seen in the analysis of many science experiments. SciDatBench is affiliated as an approved Science Data working group with the very successful MLPerf activity with 80 organizational members looking at Industry machine learning benchmarks. The state-of-the-art examples in SciDatBench are contributing to progress in scientific discovery that advances the national health, prosperity, and welfare, as stated by NSF's mission. The project is proactively involving under-represented communities in its activities. SciDatBench supports comparative studies and identifies requirements for future cyberinfrastructure to support scientific data analysis. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →