AOCL-LibM - 5.0 English

AOCL Performance Tuning Guide (63859)

Document ID
63859
Release Date
2024-10-10
Version
5.0 English

6. AOCL-LibM#

AOCL-LibM supports different variants of Math routines, namely - scalar, vector and fast variants. The Vector and fast variants of AOCL-LibM routines provide higher performance than their scalar counterparts at the cost of reduced accuracy.

6.1. Fast Scalar Variants#

Faster versions of some of the scalar routines are available as a separate library libalmfast.so. It is to be noted that these routines do not cover corner case checks and special case checks mandated by IEEE754 standards for scalar Math routines. Hence, this may lead to lower accuracy outputs in such cases. It contains fast versions of the following functions

  • acosf, acos, asinf, asin, atanf, atan, cosf, cos, erff, erf, expf, exp, logf, log, powf, pow, sinf, sin, tanf and tan

Fast versions can be selected by setting LD_PRELOAD=/path/to/libalmfast.so, or using certain flags with the AOCC compiler. For more information, please refer to the AOCC 5.0 User Guide.

6.2. Vector Variants#

AOCL-LibM includes vector variants of many of the core math functions. For a complete list of functions available, please refer to AOCL 5.0 User Guide. Few important points to be noted about AOCL-LibM vector routines are mentioned below:

  • These routines trade off some of the accuracy for increased performance. However, the maximum ULP of these routines are not more than 4.0

  • While these routines take advantage of AMD64 architecture for performance, some improvements are also made by sacrificing error handling and input argument checks

  • Abnormal inputs may lead to unpredictable results. It is therefore the responsibility of the caller of these routines to make sure that the arguments are valid

  • These variants do not set IEEE error codes and hence, the user must not rely on them to do so

Vector variants of AOCL-LibM can be enabled by using AOCC compiler with --ffast-math --fveclib=AMDLIBM flags. These functions can also be called directly. However, they accept input arguments of type __m128d, __m128, __m256d, __m256, __m512d and __m512. Hence, the input values need to be packed and the results later unpacked manually to avoid losing portability.