AOCL-LibM - 5.0 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2024-12-14
Version
5.0 English

7. AOCL-LibM#

AOCL-LibM is a high-performant implementation of LibM, the standard C library of elementary floating-point mathematical functions. It includes many of the functions from the C99 standard. Single and double precision versions of the functions are provided, all optimized for accuracy and performance, including a small number of complex functions. There are also a number of vector and fast scalar variants, in which a small amount of the accuracy has been traded for greater performance.

Note

  1. Behavior might be undefined if AVX512 is disabled in the BIOS configuration on the Zen5 platform.

7.1. Library Contents#

7.1.1. Scalar Functions#

A list of the scalar functions present in the library is provided below.

Note

An f at the end of the function name indicates that it is single-precision; otherwise, it is double-precision. They can be called by a standard C99 function naming convention and must be linked with AOCL-LibM before standard libm.

For example:

$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/AOCL-LibM_library

$ clang -Wall -std=c99 myprogram.c -o myprogram -L<Path to AOCL-LibM Library> -lalm -lm
Or
$ gcc -Wall -std=c99 myprogram.c -o myprogram -L<Path to AOCL-LibM Library> -lalm -lm
  • Trigonometric

    • cosf, cos, sinf, sin, tanf, tan, sincosf, and sincos, hypotf, and hypot

  • Inverse Trigonometric

    • acosf, acos, asinf, asin, atanf, atan, atan2f, and atan2

  • Hyperbolic

    • coshf, cosh, sinhf, sinh, tanhf, and tanh

  • Inverse Hyperbolic

    • acoshf, acosh, asinhf, asinh, atanhf, and atanh

  • Exponential and Logarithmic

    • expf, exp, exp2f, exp2, exp10f, exp10, expm1f, and expm1

    • logf, log, log10f, log10, log2f, log2, log1pf, and log1p

    • logbf, logb, ilogbf, and ilogb

    • ldexpf, and ldexp

    • scalbnf, scalbn, scalblnf, and scalbln

  • Error Function

    • erff and erf

  • Power

    • powf, pow, cbrtf, cbrt, sqrtf, sqrt

  • Nearest Integer

    • ceilf, ceil, floorf, floor, truncf, and trunc

    • rintf, rint, roundf, round, nearbyintf, and nearbyint

    • lrintf, lrint, llrintf, and llrint

    • lroundf, lround, llroundf, and llround

  • Remainder

    • fmodf, fmod, remainderf, and remainder

  • Manipulation

    • fabsf and fabs

    • copysignf, copysign, nanf, nan, finitef, and finite

    • modff, modf, frexpf, frexp

    • nextafterf, nextafter, nexttowardf, and nexttoward

  • Maximum, Minimum, and Positive Difference

    • fmaxf, fmax, fminf, fmin, fdimf, and fdim

Also, there are a small number of complex scalar functions: cpowf, cpow, clogf, clog, cexpf, and cexp.

7.1.2. Fast Scalar and Vector Variants#

Faster but less accurate versions of some of the scalar functions are available in the library libalmfast.so. It contains fast versions of acosf, acos, asinf, asin, atanf, atan, cosf, cos, erff, erf, expf, exp, logf, log, powf, pow, sinf, sin, tanf, and tan. These functions can be accessed by directly linking to this library before libalm.so. Fast versions can be selected by setting LD_PRELOAD=/path-to/libalmfast.so or enabled using certain flags by the AOCC compiler. For more information, refer to the AOCC 5.0 user guide.

AOCL-LibM includes vector variants for many of the core math functions as listed later in this section. A few caveats on both the fast scalar versions and the vector variants are as follows:

  • These routines trade off some of the accuracy for increased performance but should nevertheless have a maximum ULP error no greater than 4.0.

  • While these routines take advantage of the AMD64 architecture for performance, some improvements are also made by sacrificing error handling and input argument checking.

  • Abnormal inputs may produce unpredictable results. It is therefore the responsibility of the caller of these routines to ensure that their arguments are valid.

  • These variants do not set the IEEE error codes and hence, the user must not rely on them for doing so.

The vector variants can be enabled by using the AOCC compiler with the -ffast-math -fveclib=AMDLIBM flags. You can also call these functions directly; if doing so, you must take care to avoid losing portability. As these functions accept arguments in _m128, _m128d, _m256, _m256d, _m512 and _m512d types, you must manually pack and later unpack to and from the appropriate data type.

The following naming convention is used for the vector functions:

amd_vr<type><vec_size>_<func>

where,

  • v - vector

  • r - real

  • <type> - s for single precision and d for double precision

  • <vec_size> - 4, 8, or 16 for single-precision; 2, 4, or 8 for double-precision; or ‘a’ if it is a vector array function

  • <func> - function name, such as exp or expf

For example, a single precision 4 element version of exp has the signature:

_m128 amd_vrs4_expf (_m128 x);

The list of available vector functions is as follows:

Note

All these functions have an amd_ prefix, but this has been omitted in the following list for brevity.

  • Exponential

    • vrs8_expf and vrs8_exp2f

    • vrs4_expf, vrs4_exp2f, vrs4_exp10f, and vrs4_expm1f

    • vrsa_expf, vrsa_exp2f, vrsa_exp10f, and vrsa_expm1f

    • vrd2_exp, vrd2_exp2, vrd2_exp10, vrd2_expm1, vrd4_exp, and vrd4_exp2

    • vrda_exp, vrda_exp2, vrda_exp10, and vrda_expm1

    • vrs16_expf and vrs16_exp2f

    • vrd8_exp and vrd8_exp2

  • Logarithmic

    • vrs8_logf, vrs8_log2f, and vrs8_log10f

    • vrs4_logf, vrs4_log2f, vrs4_log10f, and vrs4_log1pf

    • vrd4_log and vrd4_log2

    • vrsa_logf, vrsa_log2f, vrsa_log10f, and vrsa_log1pf

    • vrd2_log, vrd2_log2, vrd2_log10, and vrd2_log1p

    • vrda_log, vrda_log2, vrda_log10, vrda_log1p

    • vrs16_logf, vrs16_log2f, and vrs16_log10f

    • vrd8_log and vrd8_log2

  • Trigonometric

    • vrs4_cosf, vrs8_cosf, vrs4_sinf, and vrs8_sinf

    • vrsa_cosf, vrsa_sinf, and vrsa_sincosf

    • vrd4_sin, vrd4_cos, vrd4_tan, and vrd4_sincos

    • vrd2_cos, vrd2_sin, vrd2_tan, and vrd2_sincos

    • vrda_cos, vrda_sin, and vrda_sincos

    • vrs16_cosf, vrs16_sinf, and vrs16_tanf

    • vrd8_cos, vrd8_sin, vrd8_tan, and vrd8_sincos

  • Inverse Trigonometric

    • vrs4_acosf, vrs4_asinf, and vrs8_asinf

    • vrs4_atanf, vrs8_atanf, and vrd2_atan

    • vrs16_atanf, vrs16_asinf, and vrs16_acosf

    • vrd8_atan and vrd8_asin

  • Hyperbolic

    • vrs4_coshf and vrs4_tanhf

    • vrs8_coshf and vrs8_tanhf

    • vrs16_tanhf

  • Power and Root functions

    • vrs4_powf, vrd2_pow, vrd4_pow, vrs8_powf, vrda_pow and vrsa_powf

    • vrs16_powf and vrd8_pow

    • vrd2_powx, vrd4_powx and vrd8_powx

    • vrs4_powxf, vrs8_powxf and vrs16_powxf

    • vrd2_sqrt, vrd4_sqrt, vrd8_sqrt and vrda_sqrt

    • vrs4_sqrtf, vrs8_sqrtf, vrs16_sqrtf and vrsa_sqrtf

  • Error Function

    • vrs4_erff, vrd2_erf, vrs8_erff, and vrd4_erf

    • vrd16_erff and vrd8_erf

  • Arithmetic Functions

    • vrsa_addf, vrsa_addfi, vrda_add, and vrda_addi

    • vrsa_subf, vrsa_subfi, vrda_sub, and vrda_subi

    • vrsa_mulf, vrsa_mulfi, vrda_mul, and vrda_muli

    • vrsa_divf, vrsa_divfi, vrda_div, and vrda_divi

    • vrsa_fmaxf, vrsa_fmaxfi, vrda_fmax, and vrda_fmaxi

    • vrsa_fminf, vrsa_fminfi, vrda_fmin, and vrda_fmini

    • vrd2_fabs, vrd4_fabs and vrda_fabs

    • vrs4_fabsf, vrs8_fabsf and vrsa_fabsf

    • vrd2_linearfrac, vrd4_linearfrac, vrd8_linearfrac and vrda_linearfrac

    • vrs4_linearfracf, vrs8_linearfracf, vrs16_linearfracf and vrsa_linearfracf

7.2. Installation#

7.2.1. Installing the Pre-Built Binaries on Linux#

The AOCL-LibM binary for Linux, compiled with AOCC and GCC, is available at the following URL:

https://www.amd.com/en/developer/aocl/libm.html

The AOCL-LibM library can also be installed from the AOCL master installer tar/deb/rpm/exe files available on AMD Developer Central (https://www.amd.com/en/developer/aocl.html).

The tar and zip files include pre-built binaries of other AOCL libraries as explained in Using Master Package.

7.2.2. Building AOCL-LibM on Linux#

Software requirements for compilation:

  • GCC versions 9.2 through 13.1

    It is recommend to use a GCC version of 9.2 or later as 9.2 is the version at which AMD “Zen2” compiler optimizations were introduced.

    AMD “Zen3” compiler optimizations were added at GCC 10.3 and AMD “Zen4” at 12.3.

  • Clang 12.0.0 (AOCC 3.0) through Clang 17.0.0 (AOCC 4.2)

  • Virtualenv with Python 3.6 or later

  • SCons version 3.1.1 or later

  • libstdc++ (required for AOCL-Utils)

The minimum and maximum permitted versions of GCC and Clang are set in the file scripts/ site_scons/alm/check.py. You can edit it to allow the use of other compiler versions.

Refer to Installing AOCL to install the AOCL-Utils library. Then, complete the following steps to compile AOCL-LibM:

  1. Download source from GitHub (amd/aocl-libm-ose).

  2. Navigate to the LibM folder and checkout the branch aocl-5.0:

    $ cd aocl-libm-ose
    $ git checkout aocl-5.0
    
  3. Create a virtual environment:

    $ virtualenv -p python3 .venv3
    
  4. Activate the virtual environment:

    $ source .venv3/bin/activate
    
  5. Install SCons:

    $ pip install scons
    
  6. Compile AOCL-LibM:

    Basic build command: scons --aocl_utils_install_path=<libaoclutils library path>
    
    Additional Flags
    
    Build in parallel: -j<number of parallel builds>
    Installation: install --prefix=<path to install>
    Compiler selection: ALM_CC=<gcc/clang executable path>
    ALM_CXX=<g++/clang++ executable path>
    Verbosity: --verbose=1
    Debug mode build: --debug_mode=libs
    
  7. By default, the libraries (static and dynamic) will be compiled and generated in the following location:

    aocl-libm-ose/build/aocl-release/src/

    If a debug mode build has been selected, the libraries (static and dynamic) will instead be compiled and generated in the following location:

    aocl-libm-ose/build/aocl-debug/src

    If the installation option is used, the libraries will also be copied to the directory <path to install>/lib.

7.2.3. Building AOCL-LibM on Windows#

Minimum software requirements for compilation:

  • Windows 10/11 or Windows Server 2019/2022

  • LLVM compiler V14.0 for AMD “Zen3” or AMD “Zen4” support (or LLVM compiler V11.0 for AMD “Zen2” support)

  • Microsoft Visual Studio 2019 build 16 or 2022 build 17

  • Windows SDK Version 10.0.19041.0

  • Virtualenv with Python 3.6 or later

  • SCons 4.4.0

  • libstdc++ (required for AOCL-Utils)

Refer to Installing AOCL to install the AOCL-Utils library. Then, complete the following steps to install AOCL-LibM:

  1. Download source from GitHub (amd/aocl-libm-ose).

  2. Navigate to the folder:

    $ cd aocl-libm-ose
    
  3. Install virtualenv:

    $ pip install virtualenv
    
  4. Initialize the environment for correct architecture using Visual Studio vcvarsall.bat file using following command:

    $ "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
    
  5. Activate virtual environment and install SCons inside:

    $ virtualenv -p python .venv3
    $ .venv3\Scripts\activate
    $ pip install scons
    
  6. Build the project using the clang compiler:

    Basic build command: scons ALM_CC=<clang-cl executable path> ALM_CXX=<clang-cl executable path> --aocl_utils_install_path="<libaoclutils library path>"
    
    Additional Flags:
    
    Build in parallel: -j<number of parallel builds>
    Verbosity: --verbose=1
    Debug mode build: --debug_mode=libs
    

    For example:

    $ scons -j32 ALM_CC="C:\PROGRA~1\LLVM\bin\clang-cl.exe" ALM_CXX="C:\PROGRA~1\LLVM\bin\clang- cl.exe" --verbose=1
    

By default, the static (libalm-static.lib) and dynamic (libalm.dll and libalm.lib) libraries are compiled and generated in the following location:

aocl-libm-ose/build/aocl-release/src/

If a debug mode build has been selected, the libraries will instead be compiled and generated in the following location:

aocl-libm-ose/build/aocl-debug/src

7.2.4. Building AOCL-LibM on Linux Using CMake#

Minimum CMake version requires is 3.22

To list cmake configuration preset names

$ cmake --list-presets

To configure cmake, select any preset name from --list-presets

$ cmake --preset dev-release-gcc --fresh

To build, select corresponding config preset ame from --build --list-presets

$ cmake --build --preset dev-release-gcc

To build library in parallel

$ cmake --build --preset dev-release-gcc -j

To build library in verbose mode

$ cmake --build --preset dev-release-gcc -v

CMake-built aocl-libm library is installed only in release mode and the aocl-libm library is installed in build/{presetName}.

7.3. Using AOCL-LibM#

To use AOCL-LibM in your application, complete the following steps:

  1. Include math.h as a standard way to use the C Standard library math functions.

  2. Link in the appropriate version of the library in your program.

    The Linux libraries may have a dependency on the system math library. When linking AOCL-LibM, ensure that it precedes the system math library in the link order, that is, -lalm must appear before -lm. The explicit linking of the system math library is required when using the GCC or AOCC compilers. Such explicit linking is not required with the g++ compiler (for C++).

Example: myprogram.c

#include <stdio.h>
#include <math.h>

int main() {
  float f = 3.14f;
  printf ("%f\n", expf(f));
  return 0;
}

To use AOCL-LibM scalar functions, use the following commands:

$ export LD_LIBRARY_PATH=<Path to libalm.so>:$LD_LIBRARY_PATH
$ cc -Wall -std=c99 myprogram.c -o myprogram -L<Path to libalm.so> -lalm -lm (cc can be 'gcc' or 'clang').
$ ./myprogram

You can access the vector calls by using the AOCC compiler with the flags -ffast-math -fveclib=AMDLIBM.

You can also call the functions directly, which requires manual packing and unpacking. To do so, you must include the header file amdlibm_vec.h. The following program shows such an example. For simplicity, the size and other checks are omitted.

Example: myprogram.c

#define AMD_LIBM_VEC_EXTERNAL_H
#define AMD_LIBM_VEC_EXPERIMENTAL

#include "amdlibm_vec.h"

m128 vrs4_expf ( m128 x);
m128 test_expf_v4s(float *ip, float *out)
{
  m128 ip4 = _mm_set_ps(ip1[3], ip1[2], ip1[1], ip1[0]);
  m128 op4 = vrs4_expf(ip4);
  _mm_store_ps(&out[0], op4);

  return op4;
}

You can compile myprogram.c as follows:

$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/path/to/AOCL-LibM
$ clang -Wall -std=c99 -ffastmath myprogram.c -o myprogram -L<path to
libalm.so> -lalm -lm

For more details on usage, refer to the examples folder in the release package, which contains example source code and a makefile.