7. AOCL-LibM#
AOCL-LibM is a high-performant implementation of LibM, the standard C library of elementary floating-point mathematical functions. It includes many of the functions from the C99 standard. Single and double precision versions of the functions are provided, all optimized for accuracy and performance, including a small number of complex functions. There are also a number of vector and fast scalar variants, in which a small amount of the accuracy has been traded for greater performance.
Note
Behavior might be undefined if AVX512 is disabled in the BIOS configuration on the Zen5 platform.
7.1. Library Contents#
7.1.1. Scalar Functions#
A list of the scalar functions present in the library is provided below.
Note
An f
at the end of the function name indicates that it is single-precision;
otherwise, it is double-precision.
They can be called by a standard C99 function naming convention and must be
linked with AOCL-LibM before standard libm.
For example:
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/AOCL-LibM_library
$ clang -Wall -std=c99 myprogram.c -o myprogram -L<Path to AOCL-LibM Library> -lalm -lm
Or
$ gcc -Wall -std=c99 myprogram.c -o myprogram -L<Path to AOCL-LibM Library> -lalm -lm
Trigonometric
cosf, cos, sinf, sin, tanf, tan, sincosf, and sincos, hypotf, and hypot
Inverse Trigonometric
acosf, acos, asinf, asin, atanf, atan, atan2f, and atan2
Hyperbolic
coshf, cosh, sinhf, sinh, tanhf, and tanh
Inverse Hyperbolic
acoshf, acosh, asinhf, asinh, atanhf, and atanh
Exponential and Logarithmic
expf, exp, exp2f, exp2, exp10f, exp10, expm1f, and expm1
logf, log, log10f, log10, log2f, log2, log1pf, and log1p
logbf, logb, ilogbf, and ilogb
ldexpf, and ldexp
scalbnf, scalbn, scalblnf, and scalbln
Error Function
erff and erf
Power
powf, pow, cbrtf, cbrt, sqrtf, sqrt
Nearest Integer
ceilf, ceil, floorf, floor, truncf, and trunc
rintf, rint, roundf, round, nearbyintf, and nearbyint
lrintf, lrint, llrintf, and llrint
lroundf, lround, llroundf, and llround
Remainder
fmodf, fmod, remainderf, and remainder
Manipulation
fabsf and fabs
copysignf, copysign, nanf, nan, finitef, and finite
modff, modf, frexpf, frexp
nextafterf, nextafter, nexttowardf, and nexttoward
Maximum, Minimum, and Positive Difference
fmaxf, fmax, fminf, fmin, fdimf, and fdim
Also, there are a small number of complex scalar functions: cpowf, cpow, clogf, clog, cexpf, and cexp.
7.1.2. Fast Scalar and Vector Variants#
Faster but less accurate versions of some of the scalar functions are
available in the library libalmfast.so. It contains fast versions
of acosf, acos, asinf, asin, atanf, atan, cosf, cos, erff, erf, expf,
exp, logf, log, powf, pow, sinf, sin, tanf, and tan. These functions
can be accessed by directly linking to this library before
libalm.so. Fast versions can be selected by setting
LD_PRELOAD=/path-to/libalmfast.so
or enabled using certain flags by
the AOCC compiler. For more information, refer to the AOCC 5.0 user
guide.
AOCL-LibM includes vector variants for many of the core math functions as listed later in this section. A few caveats on both the fast scalar versions and the vector variants are as follows:
These routines trade off some of the accuracy for increased performance but should nevertheless have a maximum ULP error no greater than 4.0.
While these routines take advantage of the AMD64 architecture for performance, some improvements are also made by sacrificing error handling and input argument checking.
Abnormal inputs may produce unpredictable results. It is therefore the responsibility of the caller of these routines to ensure that their arguments are valid.
These variants do not set the IEEE error codes and hence, the user must not rely on them for doing so.
The vector variants can be enabled by using the AOCC compiler with
the -ffast-math -fveclib=AMDLIBM
flags. You can also call these
functions directly; if doing so, you must take care to avoid losing
portability. As these functions accept arguments in _m128, _m128d,
_m256, _m256d, _m512 and _m512d types, you must manually pack and later unpack
to and from the appropriate data type.
The following naming convention is used for the vector functions:
amd_vr<type><vec_size>_<func>
where,
v - vector
r - real
<type> -
s
for single precision andd
for double precision<vec_size> - 4, 8, or 16 for single-precision; 2, 4, or 8 for double-precision; or ‘a’ if it is a vector array function
<func> - function name, such as
exp
orexpf
For example, a single precision 4 element version of exp has the signature:
_m128 amd_vrs4_expf (_m128 x);
The list of available vector functions is as follows:
Note
All these functions have an amd_
prefix, but this has been omitted in the following list for brevity.
Exponential
vrs8_expf and vrs8_exp2f
vrs4_expf, vrs4_exp2f, vrs4_exp10f, and vrs4_expm1f
vrsa_expf, vrsa_exp2f, vrsa_exp10f, and vrsa_expm1f
vrd2_exp, vrd2_exp2, vrd2_exp10, vrd2_expm1, vrd4_exp, and vrd4_exp2
vrda_exp, vrda_exp2, vrda_exp10, and vrda_expm1
vrs16_expf and vrs16_exp2f
vrd8_exp and vrd8_exp2
Logarithmic
vrs8_logf, vrs8_log2f, and vrs8_log10f
vrs4_logf, vrs4_log2f, vrs4_log10f, and vrs4_log1pf
vrd4_log and vrd4_log2
vrsa_logf, vrsa_log2f, vrsa_log10f, and vrsa_log1pf
vrd2_log, vrd2_log2, vrd2_log10, and vrd2_log1p
vrda_log, vrda_log2, vrda_log10, vrda_log1p
vrs16_logf, vrs16_log2f, and vrs16_log10f
vrd8_log and vrd8_log2
Trigonometric
vrs4_cosf, vrs8_cosf, vrs4_sinf, and vrs8_sinf
vrsa_cosf, vrsa_sinf, and vrsa_sincosf
vrd4_sin, vrd4_cos, vrd4_tan, and vrd4_sincos
vrd2_cos, vrd2_sin, vrd2_tan, and vrd2_sincos
vrda_cos, vrda_sin, and vrda_sincos
vrs16_cosf, vrs16_sinf, and vrs16_tanf
vrd8_cos, vrd8_sin, vrd8_tan, and vrd8_sincos
Inverse Trigonometric
vrs4_acosf, vrs4_asinf, and vrs8_asinf
vrs4_atanf, vrs8_atanf, and vrd2_atan
vrs16_atanf, vrs16_asinf, and vrs16_acosf
vrd8_atan and vrd8_asin
Hyperbolic
vrs4_coshf and vrs4_tanhf
vrs8_coshf and vrs8_tanhf
vrs16_tanhf
Power and Root functions
vrs4_powf, vrd2_pow, vrd4_pow, vrs8_powf, vrda_pow and vrsa_powf
vrs16_powf and vrd8_pow
vrd2_powx, vrd4_powx and vrd8_powx
vrs4_powxf, vrs8_powxf and vrs16_powxf
vrd2_sqrt, vrd4_sqrt, vrd8_sqrt and vrda_sqrt
vrs4_sqrtf, vrs8_sqrtf, vrs16_sqrtf and vrsa_sqrtf
Error Function
vrs4_erff, vrd2_erf, vrs8_erff, and vrd4_erf
vrd16_erff and vrd8_erf
Arithmetic Functions
vrsa_addf, vrsa_addfi, vrda_add, and vrda_addi
vrsa_subf, vrsa_subfi, vrda_sub, and vrda_subi
vrsa_mulf, vrsa_mulfi, vrda_mul, and vrda_muli
vrsa_divf, vrsa_divfi, vrda_div, and vrda_divi
vrsa_fmaxf, vrsa_fmaxfi, vrda_fmax, and vrda_fmaxi
vrsa_fminf, vrsa_fminfi, vrda_fmin, and vrda_fmini
vrd2_fabs, vrd4_fabs and vrda_fabs
vrs4_fabsf, vrs8_fabsf and vrsa_fabsf
vrd2_linearfrac, vrd4_linearfrac, vrd8_linearfrac and vrda_linearfrac
vrs4_linearfracf, vrs8_linearfracf, vrs16_linearfracf and vrsa_linearfracf
7.2. Installation#
7.2.1. Installing the Pre-Built Binaries on Linux#
The AOCL-LibM binary for Linux, compiled with AOCC and GCC, is available at the following URL:
https://www.amd.com/en/developer/aocl/libm.html
The AOCL-LibM library can also be installed from the AOCL master installer tar/deb/rpm/exe files available on AMD Developer Central (https://www.amd.com/en/developer/aocl.html).
The tar and zip files include pre-built binaries of other AOCL libraries as explained in Using Master Package.
7.2.2. Building AOCL-LibM on Linux#
Software requirements for compilation:
GCC versions 9.2 through 13.1
It is recommend to use a GCC version of 9.2 or later as 9.2 is the version at which AMD “Zen2” compiler optimizations were introduced.
AMD “Zen3” compiler optimizations were added at GCC 10.3 and AMD “Zen4” at 12.3.
Clang 12.0.0 (AOCC 3.0) through Clang 17.0.0 (AOCC 4.2)
Virtualenv with Python 3.6 or later
SCons version 3.1.1 or later
libstdc++ (required for AOCL-Utils)
The minimum and maximum permitted versions of GCC and Clang are set in the file scripts/ site_scons/alm/check.py. You can edit it to allow the use of other compiler versions.
Refer to Installing AOCL to install the AOCL-Utils library. Then, complete the following steps to compile AOCL-LibM:
Download source from GitHub (amd/aocl-libm-ose).
Navigate to the LibM folder and checkout the branch aocl-5.0:
$ cd aocl-libm-ose $ git checkout aocl-5.0
Create a virtual environment:
$ virtualenv -p python3 .venv3
Activate the virtual environment:
$ source .venv3/bin/activate
Install SCons:
$ pip install scons
Compile AOCL-LibM:
Basic build command: scons --aocl_utils_install_path=<libaoclutils library path> Additional Flags Build in parallel: -j<number of parallel builds> Installation: install --prefix=<path to install> Compiler selection: ALM_CC=<gcc/clang executable path> ALM_CXX=<g++/clang++ executable path> Verbosity: --verbose=1 Debug mode build: --debug_mode=libs
By default, the libraries (static and dynamic) will be compiled and generated in the following location:
aocl-libm-ose/build/aocl-release/src/
If a debug mode build has been selected, the libraries (static and dynamic) will instead be compiled and generated in the following location:
aocl-libm-ose/build/aocl-debug/src
If the installation option is used, the libraries will also be copied to the directory <path to install>/lib.
7.2.3. Building AOCL-LibM on Windows#
Minimum software requirements for compilation:
Windows 10/11 or Windows Server 2019/2022
LLVM compiler V14.0 for AMD “Zen3” or AMD “Zen4” support (or LLVM compiler V11.0 for AMD “Zen2” support)
Microsoft Visual Studio 2019 build 16 or 2022 build 17
Windows SDK Version 10.0.19041.0
Virtualenv with Python 3.6 or later
SCons 4.4.0
libstdc++ (required for AOCL-Utils)
Refer to Installing AOCL to install the AOCL-Utils library. Then, complete the following steps to install AOCL-LibM:
Download source from GitHub (amd/aocl-libm-ose).
Navigate to the folder:
$ cd aocl-libm-ose
Install virtualenv:
$ pip install virtualenv
Initialize the environment for correct architecture using Visual Studio
vcvarsall.bat
file using following command:$ "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
Activate virtual environment and install SCons inside:
$ virtualenv -p python .venv3 $ .venv3\Scripts\activate $ pip install scons
Build the project using the clang compiler:
Basic build command: scons ALM_CC=<clang-cl executable path> ALM_CXX=<clang-cl executable path> --aocl_utils_install_path="<libaoclutils library path>" Additional Flags: Build in parallel: -j<number of parallel builds> Verbosity: --verbose=1 Debug mode build: --debug_mode=libs
For example:
$ scons -j32 ALM_CC="C:\PROGRA~1\LLVM\bin\clang-cl.exe" ALM_CXX="C:\PROGRA~1\LLVM\bin\clang- cl.exe" --verbose=1
By default, the static (libalm-static.lib) and dynamic (libalm.dll and libalm.lib) libraries are compiled and generated in the following location:
aocl-libm-ose/build/aocl-release/src/
If a debug mode build has been selected, the libraries will instead be compiled and generated in the following location:
aocl-libm-ose/build/aocl-debug/src
7.2.4. Building AOCL-LibM on Linux Using CMake#
Minimum CMake version requires is 3.22
To list cmake configuration preset names
$ cmake --list-presets
To configure cmake, select any preset name from --list-presets
$ cmake --preset dev-release-gcc --fresh
To build, select corresponding config preset ame from --build --list-presets
$ cmake --build --preset dev-release-gcc
To build library in parallel
$ cmake --build --preset dev-release-gcc -j
To build library in verbose mode
$ cmake --build --preset dev-release-gcc -v
CMake-built aocl-libm library is installed only in release mode
and the aocl-libm library is installed in build/{presetName}
.
7.3. Using AOCL-LibM#
To use AOCL-LibM in your application, complete the following steps:
Include
math.h
as a standard way to use the C Standard library math functions.Link in the appropriate version of the library in your program.
The Linux libraries may have a dependency on the system math library. When linking AOCL-LibM, ensure that it precedes the system math library in the link order, that is,
-lalm
must appear before-lm
. The explicit linking of the system math library is required when using the GCC or AOCC compilers. Such explicit linking is not required with the g++ compiler (for C++).
Example: myprogram.c
#include <stdio.h>
#include <math.h>
int main() {
float f = 3.14f;
printf ("%f\n", expf(f));
return 0;
}
To use AOCL-LibM scalar functions, use the following commands:
$ export LD_LIBRARY_PATH=<Path to libalm.so>:$LD_LIBRARY_PATH
$ cc -Wall -std=c99 myprogram.c -o myprogram -L<Path to libalm.so> -lalm -lm (cc can be 'gcc' or 'clang').
$ ./myprogram
You can access the vector calls by using the AOCC compiler with the
flags -ffast-math -fveclib=AMDLIBM
.
You can also call the functions directly, which requires manual
packing and unpacking. To do so, you must include the header file
amdlibm_vec.h
. The following program shows such an example. For
simplicity, the size and other checks are omitted.
Example: myprogram.c
#define AMD_LIBM_VEC_EXTERNAL_H
#define AMD_LIBM_VEC_EXPERIMENTAL
#include "amdlibm_vec.h"
m128 vrs4_expf ( m128 x);
m128 test_expf_v4s(float *ip, float *out)
{
m128 ip4 = _mm_set_ps(ip1[3], ip1[2], ip1[1], ip1[0]);
m128 op4 = vrs4_expf(ip4);
_mm_store_ps(&out[0], op4);
return op4;
}
You can compile myprogram.c
as follows:
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/AOCL-LibM
$ clang -Wall -std=c99 -ffastmath myprogram.c -o myprogram -L<path to
libalm.so> -lalm -lm
For more details on usage, refer to the examples folder in the release package, which contains example source code and a makefile.