GGrantIndex
← Search

Elements: Advanced Lossless and Lossy Compression Algorithms for netCDF Datasets in Earth and Engineering Sciences (CANDEE)

$599,916FY2020CSENSF

University Of California-Irvine, Irvine CA

Investigators

Abstract

Data compression is used to store and transmit digital data such as music, television, and satellite measurements more efficiently by reducing storage space and download times. The compression software broker that this project provides will facilitate the adoption of modern compression techniques in many branches of science. Compressors come in two flavors: lossless, those that perfectly preserve the original information; and lossy, those that irretrievably discard parts of the "signal" to further improve compression. Modern lossless and lossy compression improvements in efficiency, speed, and fidelity, are striking and will benefit critical research areas by permitting researchers to simulate, store, and analyze phenomena such as stellar evolution, chemical reactions, and hurricane formation at finer detail than before, with no extra storage costs. Since digital storage consumes power, better compression also reduces power consumption and associated greenhouse gas emissions. This project will develop the software infrastructure necessary for scientific researchers to seamlessly shift their applications to produce and use data stored with state-of-the-art lossless techniques, and by new lossy techniques that are more accurate than any others. The two most widely-used self-describing dataset storage formats, HDF5 and netCDF4, support by default only one patent unencumbered lossless compression format, the venerable DEFLATE algorithm standardized in the 1990s. Our project will develop a dynamic and extensible software library of modern COmpressors and DECompressors (codecs) for scientific data called the Community Codec Repository (CCR). We will populate the CCR with cutting-edge open-source compression technology, including the LZ4, Facebook's Zstandard, and Google's Snappy codecs, and will implement default netCDF support for the CCR. Sequential lossy-then-lossless compression improves both the size and speed of compression/decompression yet is currently tedious to perform. We will implement a user-friendly method to "chain" codecs into sequential operations in memory (no intermediate files required) in our widely used netCDF Operators software package. We will also produce a new precision-preserving lossy codec, Granular Bit Grooming, that has unsurpassed compression ratio and statistical accuracy. Technical success will be evaluated by the size and speed improvements of compressing a prototypical geoscience/engineering "big data" project, the Coupled Model Intercomparison Project version 6 (CMIP6). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →
Elements: Advanced Lossless and Lossy Compression Algorithms for netCDF Datasets in Earth and Engineering Sciences (CANDEE) · GrantIndex