10. AOCL-Compression#
AOCL-Compression provides options to configure the library to best suit your use case. Options are available to enable/disable various optimizations at both compile time and run time. These options can significantly impact run time performance.
10.1. Compile Time Tuning#
Compile time tuning is available through CMake options.
Following options are available to enable/disable specific optimizations in respective compression methods:
Flag |
Description |
Use case |
---|---|---|
AOCL_DECOMPRESS_FAST (LZ4 and ZSTD) |
Enable fast decompression modes that might compromise on compression speed / ratio to produce streams that decompress faster. [Values: 1, 2, OFF (default)] |
Applications with focus on faster decompression speeds. E.g.: Applications that compress once, decompress multiple times. |
AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT1 (LZ4)
AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT2 (LZ4)
SNAPPY_MATCH_SKIP_OPT (Snappy)
AOCL_ZSTD_SEARCH_SKIP_OPT_DFAST_FAST (ZSTD)
|
If matches are not found for N number of bytes when parsing input data, increase the parsing step size from 1 to M and look for matches at these points only. This provides faster compression in scenarios where it is hard to find matches at the expense of compression ratio. AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT2 does more aggressive skipping that AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT1 for LZ4. [Values: ON / OFF, Defaults: AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT1 (OFF) AOCL_LZ4_MATCH_SKIP_OPT_LDS_STRAT2 (OFF) SNAPPY_MATCH_SKIP_OPT (ON) AOCL_ZSTD_SEARCH_SKIP_OPT_DFAST_FAST (ON)] |
Files that are hard to compress |
AOCL_LZ4_NEW_PRIME_NUMBER (LZ4)
AOCL_LZ4_EXTRA_HASH_TABLE_UPDATES (LZ4)
AOCL_LZ4_HASH_BITS_USED (LZ4)
|
A hash table is used to keep a dictionary of matches found in the past for different byte patterns in the input. The multiplicative hashing function used takes 5 bytes of input and multiples it with a hard-coded prime number to get the hash. AOCL_LZ4_NEW_PRIME_NUMBER: Alternate prime number found through empirical studies is used [Values: ON / OFF (default)] AOCL_LZ4_EXTRA_HASH_TABLE_UPDATES: When a match of length N is found, next comparison starts from src+N. Bytes that are skipped are not added to hash table by default. This flag inserts some of these skipped bytes into the hash table thus providing better compression. [Values: ON / OFF (default)] AOCL_LZ4_HASH_BITS_USED: Use more than 5 bytes (40 bits) when computing hash. LOW: 41 bits, HIGH: 44 bits. [Values: LOW / HIGH (LOW by default)] |
AOCL_LZ4_NEW_PRIME_NUMBER: Determine experimentally if it is useful for your data set AOCL_LZ4_EXTRA_HASH_TABLE_UPDATES: Better compression desired AOCL_LZ4_HASH_BITS_USED: Better speed desired |
AOCL_LZ4_OPT_PREFETCH_BACKWARDS (LZ4) |
Prefetch match candidates in advance [Values: ON / OFF (default)] |
Data with higher likelihood of finding matches |
AOCL_LZ4HC_DISABLE_PATTERN_ANALYSIS (LZ4HC) |
Disable fast code path to handle repeated byte patterns such as “000000”. Faster compression when data does not have such patterns. [Values: ON (default) / OFF] |
Enable if data contains such patterns |
ENABLE_FAST_MATH |
Enable fast math optimizations [Values: ON / OFF (default)] |
Enable if application is not sensitive to floating-point numerical accuracy |
10.2. Run Time Tuning#
Run time tuning is available through environment variables.
Following options are available to control library functionality at runtime:
Flag |
Description |
Use case |
---|---|---|
AOCL_ZLIB_QUICK_MODE (ZLIB) |
Improves compression speed at the expense of compression ratio. Primarily for level 1. Improvements can be observed for levels 2, 3 and 5 as well. [Set environment variable: AOCL_ZLIB_QUICK_MODE Values: ON / OFF (default)] |
Suitable for applications that need faster compression speeds for lower levels. |
AOCL_DISABLE_OPT |
Disable AOCL optimizations and run the reference implementation. [Set environment variable: AOCL_DISABLE_OPT Values: ON / OFF (default)] |
Benchmarking performance improvements obtained by AOCL optimizations over reference. |
OMP_NUM_THREADS |
Environment variable based thread control provided by OpenMP. Library needs to be built with AOCL_ENABLE_THREADS=ON for this to be useful. [Set environment variable: OMP_NUM_THREADS Values: >= 1. Default: all threads that the implementation supports.] |
To limit the number of threads used to run compression and decompression in multi-threaded mode. Note: OMP_NUM_THREADS setting is not required by default as the algorithm automatically determines number of threads to use based on hardware and input file size. |
10.3. Reducing Run-to-Run Variation (Hardware Settings for Optimal Performance Benchmarking)#
Some fluctuation in compression and decompression times during benchmark runs is normal. The observed variance in performance is not due to non-deterministic elements in the algorithms or the benchmark. Instead, it is majorly due to the hardware environment.
To reduce these variations, consider the following helpful techniques:
Clear the Caches: Clear the caches of the machine before running benchmarks to ensure consistent starting conditions.
Isolate the Workload: Avoid running multiple workloads on the machine during benchmarking to prevent resource contention.
Run Multiple Iterations: Perform ~50 iterations for single-threaded and ~100 for multi-threaded benchmarks, taking the best result to minimize anomalies.
Disable SMT/Hyperthreading: Disable SMT/Hyperthreading to reduce variability caused by shared resources.
Bind Processes to Cores: Use
numactl
to bind the benchmarking process to specific cores and memory nodes. For multi-threaded benchmarks, OpenMP affinity settings such asOMP_PROC_BIND
can be used.