Collaborative Research: OAC Core: CEAPA: A Systematic Approach to Minimize Compression Error Propagation in HPC Applications
University Of Iowa, Iowa City IA
Investigators
Abstract
Today’s high-performance computing (HPC) applications produce vast volumes of data for post-analysis, presenting a major storage and I/O burden for HPC systems. To significantly reduce this burden, researchers have explored to use lossy compression techniques. While lossy compression can effectively reduce the size of data, it also introduces errors to the compressed data that often lead to incorrect computation results. As a result, scientists hesitate to use lossy compression in their scientific research. Thus, there is a critical need to develop an effective method to identify compression strategies which minimize error impact for a diversity of programs. This project aims to develop a systematic approach that helps scientists automatically select a lossy compression algorithm with the lowest error impact based their HPC programs and target compression ratios. It also integrates educational and outreach activities including student training and development of new curriculum on trustworthy data reduction and dependable HPC systems. Modeling compression error propagation in HPC programs is challenging because existing lossy compressors are developed with distinct principles that generate largely different compression errors on diverse HPC data. This project includes four key thrusts: (1) developing an accurate and efficient fault injection infrastructure that integrates with the fault models of commonly used lossy compression algorithms; (2) designing a fine-grained approach to characterize error propagation in HPC programs through program analysis and deposition based on the data dependencies and life cycle of compressed data; (3) developing a predictive model using machine learning techniques to select a compression strategy that minimizes the error impact on a given program and compression ratio; and (4) integrating the technique with domain-specific error impact metrics in real-world HPC applications and demonstrates the effectiveness of the technique by selecting compression strategies that give low error impact for the same ratios. Not only this project has an enormous positive impact on HPC cyberinfrastructure, but it also helps redefine the optimization of lossy compression techniques with emphasis on both efficiency and error impact. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →