6. AOCL-LibM#
AOCL-LibM supports different variants of Math routines, namely - scalar, vector and fast variants. The Vector and fast variants of AOCL-LibM routines provide higher performance than their scalar counterparts at the cost of reduced accuracy.
6.1. Fast Scalar Variants#
Faster versions of some of the scalar routines are available as a
separate library libalmfast.so
. It is to be noted that these routines
do not cover corner case checks and special case checks mandated by
IEEE754 standards for scalar Math routines. Hence, this may lead to
lower accuracy outputs in such cases. It contains fast versions of the
following functions
acosf, acos, asinf, asin, atanf, atan, cosf, cos, erff, erf, expf, exp, logf, log, powf, pow, sinf, sin, tanf and tan
Fast versions can be selected by setting
LD_PRELOAD=/path/to/libalmfast.so
, or using certain flags with the
AOCC compiler. For more information, please refer to the AOCC 5.0 User
Guide.
6.2. Vector Variants#
AOCL-LibM includes vector variants of many of the core math functions. For a complete list of functions available, please refer to AOCL 5.0 User Guide. Few important points to be noted about AOCL-LibM vector routines are mentioned below:
These routines trade off some of the accuracy for increased performance. However, the maximum ULP of these routines are not more than 4.0
While these routines take advantage of AMD64 architecture for performance, some improvements are also made by sacrificing error handling and input argument checks
Abnormal inputs may lead to unpredictable results. It is therefore the responsibility of the caller of these routines to make sure that the arguments are valid
These variants do not set IEEE error codes and hence, the user must not rely on them to do so
Vector variants of AOCL-LibM can be enabled by using AOCC compiler with
--ffast-math --fveclib=AMDLIBM
flags. These functions can also be called
directly. However, they accept input arguments of type __m128d
,
__m128
, __m256d
, __m256
, __m512d
and __m512
. Hence, the input
values need to be packed and the results later unpacked manually to
avoid losing portability.