AOCL-LibM - 5.2 English - 68552

AOCL API Guide (68552)

Document ID
68552
Release Date
2025-12-29
Version
5.2 English

9. AOCL-LibM

AOCL-LibM is a high performance implementation of LibM, the standard C library of elementary floating-point mathematical functions. It includes many of the functions from the C99 standard. Single and double precision versions of the functions are provided, all optimized for accuracy and performance, including a small number of complex functions. There are also a number of vector and fast scalar variants, in which a small amount of the accuracy has been traded for greater performance.

9.1. API Overview

LibM functions are categorized into different types based on their mathematical domains and operational characteristics. Each category encompasses a set of related functions designed to provide optimal performance for specific computational tasks. Each LibM function can have the following variants:

  1. Scalar - Single precision (32-bit) and double precision (64-bit)

  2. Vector

    1. 128-bit - 4 × 32-bit values / 2 × 64-bit values

    2. 256-bit - 8 × 32-bit values / 4 × 64-bit values

    3. 512-bit - 16 × 32-bit values / 8 × 64-bit values

    4. Array - Variable-length array operations

Note

Accuracy Considerations

  1. For scalar functions, IEEE 754 mandates a maximum ULP of 0.5. However, not all AOCL-LibM scalar APIs have a maximum ULP of 0.5

  2. For vector variants, a maximum ULP of 4 is maintained in AOCL-LibM

  3. For fast scalar variants, a maximum ULP of 4 is maintained in AOCL-LibM. Please note that these variants also do not handle special cases, edge cases or invalid inputs

9.1.1. Naming Convention

Scalar Functions

For scalar functions, an f at the end of the function name indicates that it is single-precision; otherwise, it is double-precision. For example:

  • exp() - double precision exponential function

  • expf() - single precision exponential function

Fast Scalar Functions

Fast scalar functions use the prefix amd_fast followed by the function name. These functions provide optimized performance by trading a small amount of accuracy and do not handle special cases like NaNs or INFs. For example:

  • amd_fastexp() - fast double precision exponential function

  • amd_fastexpf() - fast single precision exponential function

Vector Functions

The following naming convention is used for the vector functions:

amd_vr<type><vec_size>_<func>

where,

  • v - vector

  • r - real

  • <type> - s for single precision and d for double precision

  • <vec_size> - 4, 8, or 16 for single-precision; 2, 4, or 8 for double-precision; or a if it is a vector array function

  • <func> - function name, such as exp or expf

For example, a single precision 4-element version of exp has the signature:

__m128 amd_vrs4_expf(__m128 x);

Similarly, a double precision 8-element version of sin would be:

__m512d amd_vrd8_sin(__m512d x);

And an array function for single precision cos would be:

void amd_vrsa_cosf(int n, float *x, float *y);

9.1.2. Function Categories

AOCL-LibM provides two complementary ways to browse and access the API documentation. Choose the categorization that best suits your needs:

By Mathematical Domain

Functions grouped by their mathematical category (trigonometric, exponential, logarithmic, etc.). Each category contains all available variants (scalar, fast scalar, and vector) of the functions within that mathematical domain.

When to use this view:

  • When you know which mathematical operation you need (e.g., sine, logarithm, power)

  • When you want to see all available implementations of a specific mathematical function

  • When implementing algorithms that require specific mathematical operations

By Implementation Variant

Functions grouped by their execution model and performance characteristics. This view is useful when you need to optimize code for specific hardware capabilities or performance requirements.

When to use this view:

  • When optimizing performance-critical code paths

  • When targeting specific SIMD instruction sets (AVX, AVX2, AVX512)

  • When you need to process arrays of data efficiently

  • When accuracy requirements allow for fast variants