AOCL-Compression - 5.0 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2024-12-14
Version
5.0 English

14. AOCL-Compression#

AOCL-Compression is a software framework of various lossless data compression and decompression methods tuned and optimized for AMD “Zen”-based CPUs. This library suite supports the following:

  • Linux and Windows platforms.

  • lz4, zlib/deflate, lzma, zstd, bzip2, snappy, and lz4hc optimized compression and decompression methods.

  • A unified standardized API set and the existing native APIs of the respective methods.

  • OpenMP based multi-threaded implementation of lz4, lz4hc, zlib, zstd, and snappy compression methods.

  • Dynamic dispatcher feature that executes the most optimal function variant implemented using Function Multi-versioning and hence, offering a single optimized library portable across different x86 CPU architectures.

  • Instruction set dispatch, running non-optimized code, and log level selection using environment variables at runtime:

    • Instruction Set Dispatch (SSE2, AVX, AVX2, and AVX512).

    • Enabling of logging and selection of log level.

  • Non-optimized (reference) code execution as supported by user environment option.

  • A test suite for validation and performance benchmarking of the supported compression and decompression methods. The test suite also supports the benchmarking of IPP compression methods, such as lz4, lz4hc, bzip2, and zlib on Linux-based platforms.

  • The library build framework offers CTest based testing of the test cases that are implemented using GTest and the library test suite. Also, it supports the testing of the compression methods through their native APIs directly, offers memory checks using Valgrind, ASAN, and source code coverage using GCOV.

  • A Python-based performance benchmarking automation script.

  • Doxygen based documentation covering library’s API level details.

  • Custom build options to exclude the unnecessary compression methods from the library build for achieving a lower code footprint.

AOCL-Compression API Guide documentation is available at https://docs.amd.com/go/en-US/63861-AOCL-compression/

14.1. Requirements#

  • CMake 3.22 or later

  • GCC 7.5.0 through 13.1.0

  • Clang 11.0 through Clang 17.0

  • AOCC 2.3 or later

  • For more information on supported operating systems, refer to Operating Systems.

14.2. Installation#

14.2.1. Using Pre-Built Libraries#

The library binary for Linux and Windows can be installed from one of the following:

14.2.2. Building from Source#

Complete the following steps to build AOCL-Compression from source:

  1. Download the AOCL-Compression source package from GitHub (amd/aocl-compression).

  2. Follow the steps in the README file to build the library and test bench for Linux or Windows.

  3. To build the library with multi-threaded support, set the CMake build option AOCL_ENABLE_THREADS=ON. The library uses OpenMP for multi-threaded support. Maximum numbers of threads to use can be set using the environment variable OMP_NUM_THREADS. To run the library built with AOCL_ENABLE_THREADS=ON in single threaded mode, set OMP_NUM_THREADS=1.

    Refer to the section Optional Build Parameters for the complete list of supported CMake build options.

  • It is recommended that applications use the unified APIs of the library over the native APIs for ease of integration and minimal code modifications required for addition of new compression methods.

14.3. Running AOCL-Compression Test Bench on Linux#

If the library is built from source, a test bench is built in addition to the library. The test bench supports several options to validate, benchmark, or debug the supported compression methods. It can be configured to use the unified APIs or native APIs to invoke compression methods supported by AOCL-Compression. It can also invoke and benchmark some of IPP’s compression methods.

To check the various options supported by the test bench, use one of the following commands:

aocl_compression_bench -h or

aocl_compression_bench -help

Use the following command to run the test bench and validate the outputs from all the supported compression and decompression methods for a given input file:

aocl_compression_bench -a -t <input filename>

Use the following command to run the test bench and check the performance of a particular compression and decompression method for a given input file:

aocl_compression_bench -ezstd:5:0 -p <input filename>

Here, 5 is the level and 0 is the additional parameter to specify the custom window size for the ZSTD method.

To run the test bench with error/debug/trace/info logs, build the library by using -DAOCL_ENABLE_LOG_FEATURE=ON and set the environment variable AOCL_ENABLE_LOG to any of the following:

  • AOCL_ENABLE_LOG=ERR for Error logs

  • AOCL_ENABLE_LOG=INFO for Error, Info logs

  • AOCL_ENABLE_LOG=DEBUG for Error, Info, and Debug logs

  • AOCL_ENABLE_LOG=TRACE for Error, Info, Debug, and Trace logs

Note

When building the library for best performance, do not enable AOCL_ENABLE_LOG_FEATURE.

To run the test bench using native APIs, use the -n option. An example to run the test bench and validate the outputs (from all the supported compression and decompression methods) for a given input file using the native APIs:

aocl_compression_bench -a -n -t <input filename>

To test and benchmark the performance of IPP’s compression methods, use the test bench option -c. Currently, IPP’s lz4, lz4hc, bzip2, and zlib methods are supported by the test bench. Refer to the README file available with the source package in GitHub for details and more test bench options: (amd/aocl-compression).

14.4. Running AOCL-Compression Test Bench on Windows#

Test bench on Windows supports all the user options as on Linux, except for the -c option to link and test the IPP’s compression methods. For more information, refer to the README file available with the source package in GitHub (amd/aocl-compression).

Note

Library portability on Windows is limited to the systems with support for AVX2 instruction set or later.

14.5. API Reference#

14.5.1. Interface Data Structures#

//Types of compression methods supported
typedef enum
{
  LZ4 = 0,
  LZ4HC,
  LZMA,
  BZIP2,
  SNAPPY,
  ZLIB,
  ZSTD,
  AOCL_COMPRESSOR_ALGOS_NUM
} aocl_compression_type;

typedef struct
{
  char *inBuf;         /**<  Pointer to input buffer data                           */
  char *outBuf;        /**<  Pointer to output buffer data                          */
  char *workBuf;       /**<  Pointer to temporary work buffer                       */
  size_t inSize;       /**<  Input data length                                      */
  size_t outSize;      /**<  Output data length                                     */
  size_t level;        /**<  Requested compression level                            */
  size_t optVar;       /**<  Additional variables or parameters                     */
  int numThreads;      /**<  Number of threads available for multi-threading        */
  int numMPIranks;     /**<  Number of available multi-core MPI ranks               */
  size_t memLimit;     /**<  Maximum memory limit for compression/decompression     */
  int measureStats;    /**<  Measure speed and size of compression/decompression    */
  uint64_t cSize;      /**<  Size of compressed output                              */
  uint64_t dSize;      /**<  Size of decompressed output                            */
  uint64_t cTime;      /**<  Time to compress input                                 */
  uint64_t dTime;      /**<  Time to decompress input                               */
  float cSpeed;        /**<  Speed of compression                                   */
  float dSpeed;        /**<  Speed of decompression                                 */
  int optOff;          /**<  Turn off all optimizations                             */
  int optLevel;        /**<  Optimization level:  \n
                             0 - non-SIMD algorithmic optimizations, \n
                             1 - SSE2 optimizations, \n
                             2 - AVX optimizations, \n
                             3 - AVX2 optimizations, \n
                             4 - AVX512 optimizations                               */
} aocl_compression_desc;

14.5.2. Library Return Error Codes#

typedef enum
{
  ERR_MEMORY_ALLOC = -6,    ///<Memory allocation failure
  ERR_INVALID_INPUT,        ///<Invalid input parameter provided
  ERR_UNSUPPORTED_METHOD,   ///<compression method not supported by the library
  ERR_EXCLUDED_METHOD,      ///<compression method excluded from this library build
  ERR_COMPRESSION_FAILED,   ///<failure in compression/decompression
  ERR_COMPRESSION_INVALID_OUTPUT ///<invalid compression/decompression output
} aocl_error_type;

14.5.3. Unified Standardized API Set#

//Interface API to provide the maximum size that compression may
//output in a "worst case" scenario (input data not compressible)
int64_t aocl_llc_compressBound(aocl_compression_type codec_type,
  size_t inSize);

//Interface API to compress data
int64_t aocl_llc_compress(aocl_compression_desc *handle,
  aocl_compression_type codec_type);

//Interface API to decompress data
int64_t aocl_llc_decompress(aocl_compression_desc *handle,
  aocl_compression_type codec_type);

//Interface API to setup the compression method
int32_t aocl_llc_setup(aocl_compression_desc *handle,
  aocl_compression_type codec_type);

//Interface API to destroy the compression method
void aocl_llc_destroy(aocl_compression_desc *handle,
  aocl_compression_type codec_type);

//Interface API to get compression library version string
const char* aocl_llc_version(void);

14.5.4. Multi-Threaded API Set#

//Interface API to get the length of the RAP frame in the compressed stream
int32_t aocl_llc_skip_rap_frame(char* src, int32_t src_size);

14.5.5. Native APIs#

//bzip2 Interface API to compress data
int BZ2_bzBuffToBuffCompress(char* dest, unsigned int* destLen,
  char* source, unsigned int sourceLen, int blockSize100k,
  int verbosity, int workFactor);

//bzip2 Interface API to decompress data
int BZ2_bzBuffToBuffDecompress(char* dest, unsigned int* destLen,
  char* source, unsigned int sourceLen, int small, int verbosity);
//lz4 Interface API to compress data
int LZ4_compress_default(const char* src, char* dst,
  int srcSize, int dstCapacity);

//lz4 Interface API to decompress data
int LZ4_decompress_safe(const char* src, char* dst,
  int compressedSize, int dstCapacity);

//lz4hc Interface API to compress data
int LZ4_compress_HC(const char* src, char* dst,
  int srcSize, int dstCapacity, int compressionLevel);

//lz4hc Interface API to decompress data
int LZ4_decompress_safe(const char* src, char* dst,
  int compressedSize, int dstCapacity);
//lzma Interface API to compress data
int LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
  const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize,
  int writeEndMark, ICompressProgress *progress, ISzAllocPtr alloc,
  ISzAllocPtr allocBig);

//lzma Interface API to decompress data
int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
  const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
  ELzmaStatus *status, ISzAllocPtr alloc);
//snappy Interface API to compress data
void RawCompress(const char* input, size_t input_length, char* compressed,
  size_t* compressed_length);

//snappy Interface API to decompress data
bool RawUncompress(const char* compressed, size_t compressed_length,
  char* uncompressed);
//zlib Interface API to compress data
Int compress2(unsigned char *dest, unsigned long *destLen,
  const unsigned char *source, unsigned long sourceLen, int level);

//zlib Interface API to decompress data
int uncompress(unsigned char *dest, unsigned long *destLen,
  const unsigned char *source, unsigned long sourceLen);
//zstd Interface API to compress data
size_t ZSTD_compress_advanced(ZSTD_CCtx* cctx, void* dst,
  size_t dstCapacity, const void* src, size_t srcSize,
  const void* dict,size_t dictSize, ZSTD_parameters params);

//zstd Interface API to decompress data
size_t ZSTD_decompressDCtx(ZSTD_DCtx* dctx,
  void* dst, size_t dstCapacity, const void* src, size_t srcSize);

14.5.6. Example Program#

14.5.6.1. Single-Threaded APIs#

The following example program shows the sample usage and calling sequence of AOCL-Compression APIs to compress and decompress a test input:

#include <stdio.h>
#include <stdlib.h>
#include "aocl_compression.h"

int main (int argc, char **argv)
{
   aocl_compression_desc aocl_compression_ds;
   aocl_compression_desc* aocl_compression_handle = &aocl_compression_ds;
   FILE* inFp = NULL;
   int file_size = 0;
   char* inPtr = NULL, * compPtr = NULL, * decompPtr = NULL;
   int64_t resultCompBound = 0, resultComp = 0, resultDecomp = 0;

   printf("Running example_unified_api\n");
   printf("Demonstrates using unified APIs for LZ4 compression and decompression\n");
   if (argc < 2)
   {
       printf("Provide input test file path\n");
       return -1;
   }
   inFp = fopen(argv[1], "rb");
   fseek(inFp, 0L, SEEK_END);
   file_size = ftell(inFp);
   rewind(inFp);

   // One of the compression methods as per aocl_compression_type
   aocl_compression_type method = LZ4;
   aocl_compression_handle->level = 0;
   aocl_compression_handle->optVar = 0;
   aocl_compression_handle->optOff = 0;
   aocl_compression_handle->measureStats = 0;

   // 1. setup and create a handle
   if (aocl_llc_setup(aocl_compression_handle, method) != 0)
   {
       printf("Setup: failed\n");
       goto error_exit;
   }

   // 2. allocate buffers
   aocl_compression_handle->inSize = file_size;
   resultCompBound = aocl_llc_compressBound(method, aocl_compression_handle->inSize);
   if (resultCompBound < 0)
   {
       printf("CompressBound: failed\n");
       goto error_exit;
   }
   aocl_compression_handle->outSize = resultCompBound;
   inPtr = (char*)calloc(1, aocl_compression_handle->inSize);
   compPtr = (char*)calloc(1, aocl_compression_handle->outSize);
   decompPtr = (char*)calloc(1, aocl_compression_handle->inSize);
   aocl_compression_handle->inBuf = inPtr;
   aocl_compression_handle->outBuf = compPtr;
   file_size = fread(inPtr, 1, file_size, inFp);

   // 3. compress
   resultComp = aocl_llc_compress(aocl_compression_handle, method);

   if (resultComp <= 0)
   {
       printf("Compression: failed\n");
       goto error_exit;
   }
   printf("Compression: done\n");

   // 4. decompress
   aocl_compression_handle->inSize = resultComp;
   aocl_compression_handle->outSize = file_size;
   aocl_compression_handle->inBuf = compPtr;
   aocl_compression_handle->outBuf = decompPtr;

   resultDecomp = aocl_llc_decompress(aocl_compression_handle, method);

   if (resultDecomp <= 0)
   {
       printf("Decompression Failure\n");
       goto error_exit;
   }
   printf("Decompression: done\n");

   // 5. destroy handle
   aocl_llc_destroy(aocl_compression_handle, method);

error_exit:
   if (inPtr)
       free(inPtr);
   if (compPtr)
       free(compPtr);
   if (decompPtr)
       free(decompPtr);
   return 0;
}

To build this example test program on a Linux system using GCC or AOCC, you must specify the aocl_compression.h header file and link the libaocl_compression.so file as follows:

$ gcc test.c -I<aocl_compression.h file path> -L
<libaocl_compression.so file path> -laocl_compression

14.5.6.2. Multi-Threaded APIs#

When the library is built with multi-threaded support (refer to section Building from Source), a Random Access Point (RAP) frame is added at the start of the compressed stream to support parallel decompression of the compressed stream/file. You must allocate sufficient additional bytes in the destination buffer to account for this frame. Users can make use of the aocl_llc_compressBound() API to query the right destination buffer size that needs to be allocated.

A stream compressed with multi-threaded AOCL-Compression library can be decompressed using any single-threaded standard decompressor by skipping the initial block of bytes containing the RAP frame present at the start of the stream.

Following test program shows the sample usage and calling sequence of AOCL-Compression APIs to get an ST compatible compressed stream from the stream produced by AOCL MT compressor:

#include <stdio.h>
#include <stdlib.h>
#include "aocl_compression.h"

int main(int argc, char** argv)
{
   aocl_compression_desc aocl_compression_ds;
   aocl_compression_desc* aocl_compression_handle = &aocl_compression_ds;
   FILE* inFp = NULL;
   int file_size = 0;
   char* inPtr = NULL, * compPtr = NULL, * decompPtr = NULL;
   int64_t resultCompBound = 0, resultComp = 0, resultDecomp = 0;

   printf("Running example_aocl_llc_skip_rap_frame\n");
   printf("Demonstrates obtaining format-compliant compressed stream from a stream produced by AOCL multi-threaded compressor\n");
   if (argc < 2)
   {
      printf("Provide input test file path\n");
      return -1;
   }
   inFp = fopen(argv[1], "rb");
   fseek(inFp, 0L, SEEK_END);
   file_size = ftell(inFp);
   rewind(inFp);

   aocl_compression_type method = LZ4; // One of the compression methods as per aocl_compression_type
   aocl_compression_handle->level = 0;
   aocl_compression_handle->optVar = 0;
   aocl_compression_handle->optOff = 0;
   aocl_compression_handle->measureStats = 0;

   // 1. setup and create a handle
   if (aocl_llc_setup(aocl_compression_handle, method) != 0)
   {
      printf("Setup: failed\n");
      goto error_exit;
   }

   // 2. allocate buffers
   aocl_compression_handle->inSize = file_size;
   resultCompBound = aocl_llc_compressBound(method, aocl_compression_handle->inSize);
   if (resultCompBound < 0)
   {
      printf("CompressBound: failed\n");
      goto error_exit;
   }
   aocl_compression_handle->outSize = resultCompBound;
   inPtr = (char*)calloc(1, aocl_compression_handle->inSize);
   compPtr = (char*)calloc(1, aocl_compression_handle->outSize);
   decompPtr = (char*)calloc(1, aocl_compression_handle->inSize);
   aocl_compression_handle->inBuf = inPtr;
   aocl_compression_handle->outBuf = compPtr;
   file_size = fread(inPtr, 1, file_size, inFp);


   // 3. MT compress
   resultComp = aocl_llc_compress(aocl_compression_handle, method);

   if (resultComp <= 0)
   {
      printf("Compression: failed\n");
      goto error_exit;
   }
   printf("Compression: done\n");

   //4. ST decompress
   // Get number of bytes for the RAP frame
   int rap_frame_len = aocl_llc_skip_rap_frame((char*)compPtr, resultComp);

   // Skip RAP frame in input stream and pass this to ST decompressor
   aocl_compression_handle->inSize = resultComp - rap_frame_len;
   aocl_compression_handle->outSize = file_size;
   aocl_compression_handle->inBuf = compPtr + rap_frame_len;
   aocl_compression_handle->outBuf = decompPtr;

   // Pass format compliant stream to aocl decompressor (or any legacy ST decompressor)
   resultDecomp = aocl_llc_decompress(aocl_compression_handle, method);

   if (resultDecomp <= 0)
   {
      printf("Decompression Failure\n");
      goto error_exit;
   }
   printf("Decompression: done\n");

   // 5. destroy handle
   aocl_llc_destroy(aocl_compression_handle, method);
error_exit:
   if (inPtr)
      free(inPtr);
   if (compPtr)
      free(compPtr);
   if (decompPtr)
      free(decompPtr);
   return 0;
}

To build this example program on a Linux system using GCC or AOCC, you must specify the path to aocl_compression.h header file and link with libaocl_compression.so file as follows:

$ gcc test.c -I <aocl_compression.h file path> -L
<libaocl_compression.so file path> -laocl_compression

14.6. Optional Build Parameters#

AOCL-Compression provides options to configure the library to best suit your use case. These optional features are not enabled by default and must be turned on depending on your need.

Following optional features can be enabled:

Table 14.1 Optional Features#

Option

Description

Use case

AOCL_ENABLE_THREADS

Enable multi-threaded compression and decompression using SMP based OpenMP threads. [Values: ON / OFF (default)]

Use multi-threads to speed up compression / decompression

AOCL_ENABLE_LOG_FEATURE

Enable logging support in library. Log level is determined by environment variable AOCL_ENABLE_LOG. [Values: ON / OFF (default)]

Debugging / Troubleshooting

AOCL_EXCLUDE_BZIP2 AOCL_EXCLUDE_LZ4 AOCL_EXCLUDE_LZ4HC AOCL_EXCLUDE_LZMA AOCL_EXCLUDE_SNAPPY AOCL_EXCLUDE_ZLIB AOCL_EXCLUDE_ZSTD

These flags can be used to exclude one or more compression methods from the library. If you want to build a library with only LZ4, enable all these flags except AOCL_EXCLUDE_LZ4. [Values: ON / OFF (default)]

Support only a subset of compression methods due to library size or performance concerns.

AOCL_XZ_UTILS_LZMA _API_EXPERIMENTAL

LZMA implementation used by AOCL is from 7z LZMA SDK. However, if APIs from xz utils LZMA is desired, we have added wrappers to 7z LZMA APIs for compress and decompress functions to match XZ APIs. NOTE: Experimental feature, not all APIs are supported yet. [Values: ON / OFF (default)]

xz utils APIs are desired

LZ4_FRAME_FORMAT _SUPPORT

LZ4 compressed data can have just raw compressed data in blocks or have it packed inside a frame. If flag is enabled, APIs to generate output in frame format like LZ4F_compressFrame are enabled. [Values: ON (default) / OFF]

To use LZ4 frame format APIs and get LZ4 output in frame format

The library also provides options to enable certain optimizations at both compile time and run time. These options impact the run time performance of the library. Details for this can be found in AOCL Performance Tuning Guide (https://docs.amd.com/go/en-US/63859-AOCL-performance-tuning-guide/).

14.7. Optional Runtime Parameters#

Run time options to configure the library are available through environment variables.

14.7.1. Instruction Set Selection#

  • The dynamic dispatcher present within AOCL-Compression executes the optimal function variant based on ISA supported by the runtime machine.

  • Optionally, AOCL optimizations can be restricted to certain ISAs by setting the environment variable AOCL_ENABLE_INSTRUCTIONS. Supported values are SSE2, AVX, AVX2 and AVX512.

  • The environment variable needs to be set before launching the application for it to take effect.