12. AOCL-LibMem#
AOCL-LibMem is a Linux library of data movement and manipulation
functions (such as memcpy()
and strcpy()
) highly optimized for AMD
“Zen” micro-architecture. It has multiple implementations of each
function, supporting AVX2, AVX512, and ERMS CPU features. The default
choice is the best- fit implementation based on the underlying
micro-architectural support for CPU features and instructions. It
also supports tunable build under which a specific implementation can
be chosen for mem* functions as per the application requirements
with respect to alignments, instruction choice, and threshold values
as tunable parameters.
This release of the AOCL-LibMem library supports the following functions:
memcpy
mempcpy
memmove
memset
memcmp
memchr
strcpy
strncpy
strcmp
strncmp
strlen
strcat
strstr
Note
Behavior might be undefined if AVX512 is disabled in the BIOS configuration on the Zen5 platform.
12.1. Building AOCL-LibMem for Linux#
Minimum software requirements for compilation:
GCC 12.2
AOCC 4.0
Python 3.10
CMake 3.22
Complete the following steps to build AOCL-LibMem for Linux:
Download and install the AOCL master installer (aocl-linux-<compiler>-<version>.tar.gz) from:
Locate the aocl-libmem folder in the root directory.
Configure for one of the following builds as required:
GCC
Default Native Build
$ cmake -D CMAKE_C_COMPILER=gcc -S <source_dir> -B <build_dir>
Cross Compiling AVX2 Binary on AVX512 Machine
$ cmake -D CMAKE_C_COMPILER=gcc -D ALMEM_ARCH=avx2 -S <source_dir> -B <build_dir>
Cross Compiling AVX512 Binary on AVX2 Machine
$ cmake -D CMAKE_C_COMPILER=gcc -D ALMEM_ARCH=avx512 -S <source_dir> -B <build_dir>
Enabling Tunable Parameters
$ cmake -D CMAKE_C_COMPILER=gcc -D ENABLE_TUNABLES=Y -S <source_dir> -B <build_dir>
AOCC (Clang)
Default Native Build
$ cmake -D CMAKE_C_COMPILER=clang -S <source_dir> -B <build_dir>
Cross Compiling AVX2 Binary on AVX512 Machine
$ cmake -D CMAKE_C_COMPILER=clang -D ALMEM_ARCH=avx2 -S <source_dir> -B <build_dir>
Cross Compiling AVX512 Binary on AVX2 Machine
$ cmake -D CMAKE_C_COMPILER=clang -D ALMEM_ARCH=avx512 -S <source_dir> -B <build_dir>
Enabling Tunable Parameters
$ cmake -D CMAKE_C_COMPILER=clang -D ENABLE_TUNABLES=Y -S <source_dir> -B <build_dir>
Build:
$ cmake --build <build_dir>
Install:
$ cmake --install <build_dir> ## For custom install path, run configure with "CMAKE_INSTALL_PREFIX"
Note
Both shared (libaocl-libmem.so) and static (libaocl-libmem.a) library files are installed under <build_dir>/lib/ path.
Dynamic Dispatcher is not supported. Hence, it is recommended not to load/run the AVX512 library on a non-AVX512 machine as it will lead to crash due to unsupported instructions.
12.2. Running an Application#
The applications can preload the AOCL-LibMem shared library to replace the standard c library memory functions for better performance gains on AMD “Zen” micro-architectures.
To run the application, preload the libaocl-libmem.so generated from the build procedure above:
$ LD_PRELOAD=<path to build/lib/libaocl-libmem.so> <executable> <params>