14. AOCL-Compression#
AOCL-Compression is a software framework of various lossless data compression and decompression methods tuned and optimized for AMD “Zen”-based CPUs. This library suite supports the following:
Linux and Windows platforms.
lz4, zlib/deflate, lzma, zstd, bzip2, snappy, and lz4hc optimized compression and decompression methods.
A unified standardized API set and the existing native APIs of the respective methods.
OpenMP based multi-threaded implementation of lz4, lz4hc, zlib, zstd, and snappy compression methods.
Dynamic dispatcher feature that executes the most optimal function variant implemented using Function Multi-versioning and hence, offering a single optimized library portable across different x86 CPU architectures.
Instruction set dispatch, running non-optimized code, and log level selection using environment variables at runtime:
Instruction Set Dispatch (SSE2, AVX, AVX2, and AVX512).
Enabling of logging and selection of log level.
Non-optimized (reference) code execution as supported by user environment option.
A test suite for validation and performance benchmarking of the supported compression and decompression methods. The test suite also supports the benchmarking of IPP compression methods, such as lz4, lz4hc, bzip2, and zlib on Linux-based platforms.
The library build framework offers CTest based testing of the test cases that are implemented using GTest and the library test suite. Also, it supports the testing of the compression methods through their native APIs directly, offers memory checks using Valgrind, ASAN, and source code coverage using GCOV.
A Python-based performance benchmarking automation script.
Doxygen based documentation covering library’s API level details.
Custom build options to exclude the unnecessary compression methods from the library build for achieving a lower code footprint.
AOCL-Compression API Guide documentation is available at https://docs.amd.com/go/en-US/63861-AOCL-compression/
14.1. Requirements#
CMake 3.22 or later
GCC 7.5.0 through 13.1.0
Clang 11.0 through Clang 17.0
AOCC 2.3 or later
For more information on supported operating systems, refer to Operating Systems.
14.2. Installation#
14.2.1. Using Pre-Built Libraries#
The library binary for Linux and Windows can be installed from one of the following:
AOCL-Compression page (https://www.amd.com/en/developer/aocl/compression.html)
AOCL master installer: tar and zip packages for Linux and Windows respectively (https://developer.amd.com/amd-aocl/)
14.2.2. Building from Source#
Complete the following steps to build AOCL-Compression from source:
Download the AOCL-Compression source package from GitHub (amd/aocl-compression).
Follow the steps in the
README
file to build the library and test bench for Linux or Windows.To build the library with multi-threaded support, set the CMake build option
AOCL_ENABLE_THREADS=ON
. The library uses OpenMP for multi-threaded support. Maximum numbers of threads to use can be set using the environment variableOMP_NUM_THREADS
. To run the library built withAOCL_ENABLE_THREADS=ON
in single threaded mode, setOMP_NUM_THREADS=1
.Refer to the section Optional Build Parameters for the complete list of supported CMake build options.
It is recommended that applications use the unified APIs of the library over the native APIs for ease of integration and minimal code modifications required for addition of new compression methods.
14.3. Running AOCL-Compression Test Bench on Linux#
If the library is built from source, a test bench is built in addition to the library. The test bench supports several options to validate, benchmark, or debug the supported compression methods. It can be configured to use the unified APIs or native APIs to invoke compression methods supported by AOCL-Compression. It can also invoke and benchmark some of IPP’s compression methods.
To check the various options supported by the test bench, use one of the following commands:
aocl_compression_bench -h
or
aocl_compression_bench -help
Use the following command to run the test bench and validate the outputs from all the supported compression and decompression methods for a given input file:
aocl_compression_bench -a -t <input filename>
Use the following command to run the test bench and check the performance of a particular compression and decompression method for a given input file:
aocl_compression_bench -ezstd:5:0 -p <input filename>
Here, 5 is the level and 0 is the additional parameter to specify the custom window size for the ZSTD method.
To run the test bench with error/debug/trace/info logs, build the
library by using -DAOCL_ENABLE_LOG_FEATURE=ON
and set the
environment variable AOCL_ENABLE_LOG
to any of the following:
AOCL_ENABLE_LOG=ERR
for Error logsAOCL_ENABLE_LOG=INFO
for Error, Info logsAOCL_ENABLE_LOG=DEBUG
for Error, Info, and Debug logsAOCL_ENABLE_LOG=TRACE
for Error, Info, Debug, and Trace logs
Note
When building the library for best performance, do not enable
AOCL_ENABLE_LOG_FEATURE
.
To run the test bench using native
APIs, use the -n
option. An example to run the test bench and
validate the outputs (from all the supported compression and
decompression methods) for a given input file using the native APIs:
aocl_compression_bench -a -n -t <input filename>
To test and benchmark the performance of IPP’s compression methods,
use the test bench option -c
. Currently, IPP’s lz4, lz4hc, bzip2,
and zlib methods are supported by the test bench. Refer to the README
file available with the source package in GitHub for details and more
test bench options:
(amd/aocl-compression).
14.4. Running AOCL-Compression Test Bench on Windows#
Test bench on Windows supports all the user options as on Linux,
except for the -c
option to link and test the IPP’s compression
methods. For more information, refer to the README
file available
with the source package in GitHub
(amd/aocl-compression).
Note
Library portability on Windows is limited to the systems with support for AVX2 instruction set or later.
14.5. API Reference#
14.5.1. Interface Data Structures#
//Types of compression methods supported
typedef enum
{
LZ4 = 0,
LZ4HC,
LZMA,
BZIP2,
SNAPPY,
ZLIB,
ZSTD,
AOCL_COMPRESSOR_ALGOS_NUM
} aocl_compression_type;
typedef struct
{
char *inBuf; /**< Pointer to input buffer data */
char *outBuf; /**< Pointer to output buffer data */
char *workBuf; /**< Pointer to temporary work buffer */
size_t inSize; /**< Input data length */
size_t outSize; /**< Output data length */
size_t level; /**< Requested compression level */
size_t optVar; /**< Additional variables or parameters */
int numThreads; /**< Number of threads available for multi-threading */
int numMPIranks; /**< Number of available multi-core MPI ranks */
size_t memLimit; /**< Maximum memory limit for compression/decompression */
int measureStats; /**< Measure speed and size of compression/decompression */
uint64_t cSize; /**< Size of compressed output */
uint64_t dSize; /**< Size of decompressed output */
uint64_t cTime; /**< Time to compress input */
uint64_t dTime; /**< Time to decompress input */
float cSpeed; /**< Speed of compression */
float dSpeed; /**< Speed of decompression */
int optOff; /**< Turn off all optimizations */
int optLevel; /**< Optimization level: \n
0 - non-SIMD algorithmic optimizations, \n
1 - SSE2 optimizations, \n
2 - AVX optimizations, \n
3 - AVX2 optimizations, \n
4 - AVX512 optimizations */
} aocl_compression_desc;
14.5.2. Library Return Error Codes#
typedef enum
{
ERR_MEMORY_ALLOC = -6, ///<Memory allocation failure
ERR_INVALID_INPUT, ///<Invalid input parameter provided
ERR_UNSUPPORTED_METHOD, ///<compression method not supported by the library
ERR_EXCLUDED_METHOD, ///<compression method excluded from this library build
ERR_COMPRESSION_FAILED, ///<failure in compression/decompression
ERR_COMPRESSION_INVALID_OUTPUT ///<invalid compression/decompression output
} aocl_error_type;
14.5.3. Unified Standardized API Set#
//Interface API to provide the maximum size that compression may
//output in a "worst case" scenario (input data not compressible)
int64_t aocl_llc_compressBound(aocl_compression_type codec_type,
size_t inSize);
//Interface API to compress data
int64_t aocl_llc_compress(aocl_compression_desc *handle,
aocl_compression_type codec_type);
//Interface API to decompress data
int64_t aocl_llc_decompress(aocl_compression_desc *handle,
aocl_compression_type codec_type);
//Interface API to setup the compression method
int32_t aocl_llc_setup(aocl_compression_desc *handle,
aocl_compression_type codec_type);
//Interface API to destroy the compression method
void aocl_llc_destroy(aocl_compression_desc *handle,
aocl_compression_type codec_type);
//Interface API to get compression library version string
const char* aocl_llc_version(void);
14.5.4. Multi-Threaded API Set#
//Interface API to get the length of the RAP frame in the compressed stream
int32_t aocl_llc_skip_rap_frame(char* src, int32_t src_size);
14.5.5. Native APIs#
//bzip2 Interface API to compress data
int BZ2_bzBuffToBuffCompress(char* dest, unsigned int* destLen,
char* source, unsigned int sourceLen, int blockSize100k,
int verbosity, int workFactor);
//bzip2 Interface API to decompress data
int BZ2_bzBuffToBuffDecompress(char* dest, unsigned int* destLen,
char* source, unsigned int sourceLen, int small, int verbosity);
//lz4 Interface API to compress data
int LZ4_compress_default(const char* src, char* dst,
int srcSize, int dstCapacity);
//lz4 Interface API to decompress data
int LZ4_decompress_safe(const char* src, char* dst,
int compressedSize, int dstCapacity);
//lz4hc Interface API to compress data
int LZ4_compress_HC(const char* src, char* dst,
int srcSize, int dstCapacity, int compressionLevel);
//lz4hc Interface API to decompress data
int LZ4_decompress_safe(const char* src, char* dst,
int compressedSize, int dstCapacity);
//lzma Interface API to compress data
int LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize,
int writeEndMark, ICompressProgress *progress, ISzAllocPtr alloc,
ISzAllocPtr allocBig);
//lzma Interface API to decompress data
int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
ELzmaStatus *status, ISzAllocPtr alloc);
//snappy Interface API to compress data
void RawCompress(const char* input, size_t input_length, char* compressed,
size_t* compressed_length);
//snappy Interface API to decompress data
bool RawUncompress(const char* compressed, size_t compressed_length,
char* uncompressed);
//zlib Interface API to compress data
Int compress2(unsigned char *dest, unsigned long *destLen,
const unsigned char *source, unsigned long sourceLen, int level);
//zlib Interface API to decompress data
int uncompress(unsigned char *dest, unsigned long *destLen,
const unsigned char *source, unsigned long sourceLen);
//zstd Interface API to compress data
size_t ZSTD_compress_advanced(ZSTD_CCtx* cctx, void* dst,
size_t dstCapacity, const void* src, size_t srcSize,
const void* dict,size_t dictSize, ZSTD_parameters params);
//zstd Interface API to decompress data
size_t ZSTD_decompressDCtx(ZSTD_DCtx* dctx,
void* dst, size_t dstCapacity, const void* src, size_t srcSize);
14.5.6. Example Program#
14.5.6.1. Single-Threaded APIs#
The following example program shows the sample usage and calling sequence of AOCL-Compression APIs to compress and decompress a test input:
#include <stdio.h>
#include <stdlib.h>
#include "aocl_compression.h"
int main (int argc, char **argv)
{
aocl_compression_desc aocl_compression_ds;
aocl_compression_desc* aocl_compression_handle = &aocl_compression_ds;
FILE* inFp = NULL;
int file_size = 0;
char* inPtr = NULL, * compPtr = NULL, * decompPtr = NULL;
int64_t resultCompBound = 0, resultComp = 0, resultDecomp = 0;
printf("Running example_unified_api\n");
printf("Demonstrates using unified APIs for LZ4 compression and decompression\n");
if (argc < 2)
{
printf("Provide input test file path\n");
return -1;
}
inFp = fopen(argv[1], "rb");
fseek(inFp, 0L, SEEK_END);
file_size = ftell(inFp);
rewind(inFp);
// One of the compression methods as per aocl_compression_type
aocl_compression_type method = LZ4;
aocl_compression_handle->level = 0;
aocl_compression_handle->optVar = 0;
aocl_compression_handle->optOff = 0;
aocl_compression_handle->measureStats = 0;
// 1. setup and create a handle
if (aocl_llc_setup(aocl_compression_handle, method) != 0)
{
printf("Setup: failed\n");
goto error_exit;
}
// 2. allocate buffers
aocl_compression_handle->inSize = file_size;
resultCompBound = aocl_llc_compressBound(method, aocl_compression_handle->inSize);
if (resultCompBound < 0)
{
printf("CompressBound: failed\n");
goto error_exit;
}
aocl_compression_handle->outSize = resultCompBound;
inPtr = (char*)calloc(1, aocl_compression_handle->inSize);
compPtr = (char*)calloc(1, aocl_compression_handle->outSize);
decompPtr = (char*)calloc(1, aocl_compression_handle->inSize);
aocl_compression_handle->inBuf = inPtr;
aocl_compression_handle->outBuf = compPtr;
file_size = fread(inPtr, 1, file_size, inFp);
// 3. compress
resultComp = aocl_llc_compress(aocl_compression_handle, method);
if (resultComp <= 0)
{
printf("Compression: failed\n");
goto error_exit;
}
printf("Compression: done\n");
// 4. decompress
aocl_compression_handle->inSize = resultComp;
aocl_compression_handle->outSize = file_size;
aocl_compression_handle->inBuf = compPtr;
aocl_compression_handle->outBuf = decompPtr;
resultDecomp = aocl_llc_decompress(aocl_compression_handle, method);
if (resultDecomp <= 0)
{
printf("Decompression Failure\n");
goto error_exit;
}
printf("Decompression: done\n");
// 5. destroy handle
aocl_llc_destroy(aocl_compression_handle, method);
error_exit:
if (inPtr)
free(inPtr);
if (compPtr)
free(compPtr);
if (decompPtr)
free(decompPtr);
return 0;
}
To build this example test program on a Linux system using GCC or
AOCC, you must specify the aocl_compression.h
header file and link the
libaocl_compression.so
file as follows:
$ gcc test.c -I<aocl_compression.h file path> -L
<libaocl_compression.so file path> -laocl_compression
14.5.6.2. Multi-Threaded APIs#
When the library is built with multi-threaded support (refer to section
Building from Source), a Random Access Point (RAP) frame is added
at the start of the compressed stream to support parallel decompression of
the compressed stream/file. You must allocate sufficient additional bytes
in the destination buffer to account for this frame. Users can make
use of the aocl_llc_compressBound()
API to query the right
destination buffer size that needs to be allocated.
A stream compressed with multi-threaded AOCL-Compression library can be decompressed using any single-threaded standard decompressor by skipping the initial block of bytes containing the RAP frame present at the start of the stream.
Following test program shows the sample usage and calling sequence of AOCL-Compression APIs to get an ST compatible compressed stream from the stream produced by AOCL MT compressor:
#include <stdio.h>
#include <stdlib.h>
#include "aocl_compression.h"
int main(int argc, char** argv)
{
aocl_compression_desc aocl_compression_ds;
aocl_compression_desc* aocl_compression_handle = &aocl_compression_ds;
FILE* inFp = NULL;
int file_size = 0;
char* inPtr = NULL, * compPtr = NULL, * decompPtr = NULL;
int64_t resultCompBound = 0, resultComp = 0, resultDecomp = 0;
printf("Running example_aocl_llc_skip_rap_frame\n");
printf("Demonstrates obtaining format-compliant compressed stream from a stream produced by AOCL multi-threaded compressor\n");
if (argc < 2)
{
printf("Provide input test file path\n");
return -1;
}
inFp = fopen(argv[1], "rb");
fseek(inFp, 0L, SEEK_END);
file_size = ftell(inFp);
rewind(inFp);
aocl_compression_type method = LZ4; // One of the compression methods as per aocl_compression_type
aocl_compression_handle->level = 0;
aocl_compression_handle->optVar = 0;
aocl_compression_handle->optOff = 0;
aocl_compression_handle->measureStats = 0;
// 1. setup and create a handle
if (aocl_llc_setup(aocl_compression_handle, method) != 0)
{
printf("Setup: failed\n");
goto error_exit;
}
// 2. allocate buffers
aocl_compression_handle->inSize = file_size;
resultCompBound = aocl_llc_compressBound(method, aocl_compression_handle->inSize);
if (resultCompBound < 0)
{
printf("CompressBound: failed\n");
goto error_exit;
}
aocl_compression_handle->outSize = resultCompBound;
inPtr = (char*)calloc(1, aocl_compression_handle->inSize);
compPtr = (char*)calloc(1, aocl_compression_handle->outSize);
decompPtr = (char*)calloc(1, aocl_compression_handle->inSize);
aocl_compression_handle->inBuf = inPtr;
aocl_compression_handle->outBuf = compPtr;
file_size = fread(inPtr, 1, file_size, inFp);
// 3. MT compress
resultComp = aocl_llc_compress(aocl_compression_handle, method);
if (resultComp <= 0)
{
printf("Compression: failed\n");
goto error_exit;
}
printf("Compression: done\n");
//4. ST decompress
// Get number of bytes for the RAP frame
int rap_frame_len = aocl_llc_skip_rap_frame((char*)compPtr, resultComp);
// Skip RAP frame in input stream and pass this to ST decompressor
aocl_compression_handle->inSize = resultComp - rap_frame_len;
aocl_compression_handle->outSize = file_size;
aocl_compression_handle->inBuf = compPtr + rap_frame_len;
aocl_compression_handle->outBuf = decompPtr;
// Pass format compliant stream to aocl decompressor (or any legacy ST decompressor)
resultDecomp = aocl_llc_decompress(aocl_compression_handle, method);
if (resultDecomp <= 0)
{
printf("Decompression Failure\n");
goto error_exit;
}
printf("Decompression: done\n");
// 5. destroy handle
aocl_llc_destroy(aocl_compression_handle, method);
error_exit:
if (inPtr)
free(inPtr);
if (compPtr)
free(compPtr);
if (decompPtr)
free(decompPtr);
return 0;
}
To build this example program on a Linux system using GCC or AOCC,
you must specify the path to aocl_compression.h
header file and
link with libaocl_compression.so
file as follows:
$ gcc test.c -I <aocl_compression.h file path> -L
<libaocl_compression.so file path> -laocl_compression
14.6. Optional Build Parameters#
AOCL-Compression provides options to configure the library to best suit your use case. These optional features are not enabled by default and must be turned on depending on your need.
Following optional features can be enabled:
Option |
Description |
Use case |
---|---|---|
AOCL_ENABLE_THREADS |
Enable multi-threaded compression and decompression using SMP based OpenMP threads. [Values: ON / OFF (default)] |
Use multi-threads to speed up compression / decompression |
AOCL_ENABLE_LOG_FEATURE |
Enable logging support in library. Log level is determined by environment variable AOCL_ENABLE_LOG. [Values: ON / OFF (default)] |
Debugging / Troubleshooting |
AOCL_EXCLUDE_BZIP2 AOCL_EXCLUDE_LZ4 AOCL_EXCLUDE_LZ4HC AOCL_EXCLUDE_LZMA AOCL_EXCLUDE_SNAPPY AOCL_EXCLUDE_ZLIB AOCL_EXCLUDE_ZSTD |
These flags can be used to exclude one or more compression methods from the library. If you want to build a library with only LZ4, enable all these flags except AOCL_EXCLUDE_LZ4. [Values: ON / OFF (default)] |
Support only a subset of compression methods due to library size or performance concerns. |
AOCL_XZ_UTILS_LZMA _API_EXPERIMENTAL |
LZMA implementation used by AOCL is from 7z LZMA SDK. However, if APIs from xz utils LZMA is desired, we have added wrappers to 7z LZMA APIs for compress and decompress functions to match XZ APIs. NOTE: Experimental feature, not all APIs are supported yet. [Values: ON / OFF (default)] |
xz utils APIs are desired |
LZ4_FRAME_FORMAT _SUPPORT |
LZ4 compressed data can have just raw compressed data in blocks or have it packed inside a frame. If flag is enabled, APIs to generate output in frame format like LZ4F_compressFrame are enabled. [Values: ON (default) / OFF] |
To use LZ4 frame format APIs and get LZ4 output in frame format |
The library also provides options to enable certain optimizations at both compile
time and run time. These options impact the run time performance of the library.
Details for this can be found in AOCL Performance Tuning Guide
(https://docs.amd.com/go/en-US/63859-AOCL-performance-tuning-guide/).
14.7. Optional Runtime Parameters#
Run time options to configure the library are available through environment variables.
14.7.1. Instruction Set Selection#
The dynamic dispatcher present within AOCL-Compression executes the optimal function variant based on ISA supported by the runtime machine.
Optionally, AOCL optimizations can be restricted to certain ISAs by setting the environment variable
AOCL_ENABLE_INSTRUCTIONS
. Supported values areSSE2
,AVX
,AVX2
andAVX512
.The environment variable needs to be set before launching the application for it to take effect.