SZ3 Compression

Vast volumes of scientific data cannot be stored and transferred efficiently because of limited I/O bandwidth, network bandwidth, and storage capacity. Error-bounded lossy compressors are a promising solution to reducing scientific data volumes while also addressing user data fidelity requirements. For example, SZ, ZFP and MGARD allow users to set an absolute error bound when performing lossy compression such that the difference between the original data and reconstructed data is bounded by that threshold. Climate scientists have verified that the reconstructed data generated by error-bounded lossy compressors are acceptable for post hoc analysis.

The SZ3 compression can be separated into four stages: prediction, quantization, huffman-coding, and lossless copmression. We focus more on the first two stages to improve the performance and satisfy a diversified user requirements. We designed several prediction algorithms including Lorenzo, Linear Regression, Interpolation. The performance of the predictor is highly dependent on the data patterns, and thus we also designed some methods to choose the most suitable predictor based on data sampling. We proposed range-based and region-based quantization to vary the error bounds in different regions and data ranges to further improve the compression performance.

Further reading

  • Yuanjian Liu et al., Optimizing Multi-Range based Error-Bounded Lossy Compression for Scientific Datasets (HiPC 2021) 10.1109/HiPC53243.2021.00036
  • Yuanjian Liu et al., Understanding Effectiveness of Multi-Error-Bounded Lossy Compression for Preserving Ranges of Interest in Scientific Analysis (DRBSD-7) 10.1109/DRBSD754563.2021.00010
  • Optimizing Error-Bounded Lossy Compression for Scientific Data with Diverse Constraints (TPDS 2022) 10.1109/TPDS.2022.3194695