AOCL-Compression - 5.0 English

AOCL Performance Tuning Guide (63859)

Document ID
63859
Release Date
2024-10-10
Version
5.0 English

10. AOCL-Compression#

AOCL-Compression provides options to configure the library to best suit your use case. Options are available to enable/disable various optimizations at both compile time and run time. These options can significantly impact run time performance.

10.1. Compile Time Tuning#

Compile time tuning is available through CMake options.

Following options are available to enable/disable specific optimizations in respective compression methods:

Flag

Description

Use case

AOCL_DECOMPRESS_FAST (LZ4 and ZSTD)

Enable fast decompression modes that might compromise on compression speed / ratio to produce streams that decompress faster.

[Values: 1, 2, OFF (default)]

Applications with focus on faster decompression speeds.

E.g.: Applications that compress once, decompress multiple times.

AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT1 (LZ4)
AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT2 (LZ4)
SNAPPY_MATCH_SKIP_OPT (Snappy)
AOCL_ZSTD_SEARCH_SKIP_OPT_DFAST_FAST (ZSTD)

If matches are not found for N number of bytes when parsing input data, increase the parsing step size from 1 to M and look for matches at these points only. This provides faster compression in scenarios where it is hard to find matches at the expense of compression ratio.

AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT2 does more aggressive skipping that AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT1 for LZ4.

[Values: ON / OFF, Defaults: AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT1 (OFF) AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT2 (OFF) SNAPPY_MATCH_SKIP_OPT (ON) AOCL_ZSTD_SEARCH_SKIP_OPT_DFAST_FAST (ON)]

Files that are hard to compress

AOCL_LZ4_NEW_PRIME_NUMBER (LZ4)
AOCL_LZ4_EXTRA_HASH_TABLE_UPDATES (LZ4)
AOCL_LZ4_HASH_BITS_USED (LZ4)

A hash table is used to keep a dictionary of matches found in the past for different byte patterns in the input. The multiplicative hashing function used takes 5 bytes of input and multiples it with a hard-coded prime number to get the hash.

AOCL_LZ4_NEW_PRIME_NUMBER: Alternate prime number found through empirical studies is used [Values: ON / OFF (default)]

AOCL_LZ4_EXTRA_HASH_TABLE_UPDATES: When a match of length N is found, next comparison starts from src+N. Bytes that are skipped are not added to hash table by default. This flag inserts some of these skipped bytes into the hash table thus providing better compression. [Values: ON / OFF (default)]

AOCL_LZ4_HASH_BITS_USED: Use more than 5 bytes (40 bits) when computing hash. LOW: 41 bits, HIGH: 44 bits. [Values: LOW / HIGH (LOW by default)]

AOCL_LZ4_NEW_PRIME_NUMBER: Determine experimentally if it is useful for your data set

AOCL_LZ4_EXTRA_HASH_TABLE_UPDATES: Better compression desired

AOCL_LZ4_HASH_BITS_USED: Better speed desired

AOCL_LZ4_OPT_PREFETCH_BACKWARDS (LZ4)

Prefetch match candidates in advance [Values: ON / OFF (default)]

Data with higher likelihood of finding matches

AOCL_LZ4HC_DISABLE_PATTERN_ANALYSIS (LZ4HC)

Disable fast code path to handle repeated byte patterns such as “000000”. Faster compression when data does not have such patterns. [Values: ON (default) / OFF]

Enable if data contains such patterns

ENABLE_FAST_MATH

Enable fast math optimizations [Values: ON / OFF (default)]

Enable if application is not sensitive to floating-point numerical accuracy

10.2. Run Time Tuning#

Run time tuning is available through environment variables.

Following options are available to control library functionality at runtime:

Flag

Description

Use case

AOCL_ZLIB_QUICK_MODE (ZLIB)

Improves compression speed at the expense of compression ratio. Primarily for level 1. Improvements can be observed for levels 2, 3 and 5 as well. [Set environment variable: AOCL_ZLIB_QUICK_MODE Values: ON / OFF (default)]

Suitable for applications that need faster compression speeds for lower levels.

AOCL_DISABLE_OPT

Disable AOCL optimizations and run the reference implementation. [Set environment variable: AOCL_DISABLE_OPT Values: ON / OFF (default)]

Benchmarking performance improvements obtained by AOCL optimizations over reference.

OMP_NUM_THREADS

Environment variable based thread control provided by OpenMP. Library needs to be built with AOCL_ENABLE_THREADS=ON for this to be useful. [Set environment variable: OMP_NUM_THREADS Values: >= 1. Default: all threads that the implementation supports.]

To limit the number of threads used to run compression and decompression in multi-threaded mode.

Note: OMP_NUM_THREADS setting is not required by default as the algorithm automatically determines number of threads to use based on hardware and input file size.

10.3. Reducing Run-to-Run Variation (Hardware Settings for Optimal Performance Benchmarking)#

Some fluctuation in compression and decompression times during benchmark runs is normal. The observed variance in performance is not due to non-deterministic elements in the algorithms or the benchmark. Instead, it is majorly due to the hardware environment.

To reduce these variations, consider the following helpful techniques:

  1. Clear the Caches: Clear the caches of the machine before running benchmarks to ensure consistent starting conditions.

  2. Isolate the Workload: Avoid running multiple workloads on the machine during benchmarking to prevent resource contention.

  3. Run Multiple Iterations: Perform ~50 iterations for single-threaded and ~100 for multi-threaded benchmarks, taking the best result to minimize anomalies.

  4. Disable SMT/Hyperthreading: Disable SMT/Hyperthreading to reduce variability caused by shared resources.

  5. Bind Processes to Cores: Use numactl to bind the benchmarking process to specific cores and memory nodes. For multi-threaded benchmarks, OpenMP affinity settings such as OMP_PROC_BIND can be used.