1. Introduction#
AMD Optimizing CPU Libraries (AOCL) are a set of numerical libraries optimized for AMD “Zen”- based processors, including EPYCTM, RyzenTM ThreadripperTM, and RyzenTM. This document provides instructions on installing and using all the AMD optimized libraries.
AOCL is comprised of the following libraries:
AOCL-BLAS is a portable software framework for performing high-performance Basic Linear Algebra Subprograms (BLAS) functionality.
AOCL-LAPACK is a portable library for dense matrix computations that provides the functionality present in the Linear Algebra Package (LAPACK).
AOCL-FFTW (Fastest Fourier Transform in the West) is a comprehensive collection of fast C routines for computing the Discrete Fourier Transform (DFT) and various special cases.
AOCL-LibM is a software library containing elementary math functions optimized for x86-64 processor based machines.
AOCL-Utils is a library which provides APIs to check the available CPU features/flags, cache topology, and so on of AMD “Zen”-based CPUs.
AOCL-ScaLAPACK is a library of high-performance linear algebra routines for parallel distributed memory machines. It depends on external libraries including BLAS and LAPACK for linear algebra computations.
AOCL-RNG is a library that provides a set of pseudo-random number generators, quasi-random number generator and statistical distribution functions optimized for AMD “Zen”-based processors.
AOCL-SecureRNG is a library that provides APIs to access the cryptographically secure random numbers generated by the AMD hardware random number generator.
AOCL-Sparse is a library containing the basic linear algebra subroutines for sparse matrices and vectors optimized for AMD “Zen”-based CPUs.
AOCL-LibMem is AMD’s optimized implementation of memory manipulation functions for AMD “Zen”-based CPUs.
AOCL-Cryptography is AMD’s optimized implementation of cryptographic functions.
AOCL-Compression is a software framework of various lossless data compression and decompression methods tuned and optimized for AMD “Zen”-based CPUs.
AOCL-DA is a data analytics library providing optimized building blocks for data analysis and classical machine learning problems.
All the above libraries are open-source except AOCL-RNG.
1.1. Feature Support Matrix#
Following tables summarize the list of supported features and dependencies for the AOCL libraries:
Library/Feature |
Vector |
Dynamic Dispatcher |
Precision |
ISA Optional Selection by the User |
---|---|---|---|---|
AOCL-BLAS |
AVX2, AVX512 |
Yes |
Single, Double, Complex, Double Complex. Mixed Precision, and Low Precision (INT16, INT8, UINT8, BFLOAT16) - currently supported only for GEMM and GEMV APIs. |
Yes, using AOCL_ENABLE_ INSTRUCTION |
AOCL-LAPACK |
AVX2, AVX512 |
Partially (requires AVX2 support) |
Single, Double, Complex, Double Complex |
Yes, using AOCL_ENABLE_ INSTRUCTION |
AOCL-FFTW |
AVX2, AVX512 |
Yes for Linux with GCC and AOCC. No for Windows with Clang. MSVC compiler has not been used on Windows. |
Single, Double, Long-double, Quad |
No |
AOCL-LibM |
AVX2, AVX512 |
Yes |
Single, Double, Complex, Double Complex |
No |
AOCL-Sparse |
AVX2, AVX512 |
Partial (only for selected APIs) |
Single, Double, Complex, Double Complex |
Partial (only for selected APIs) using AOCL_ENABLE_ INSTRUCTION |
AOCL-Cryptography |
AVX2, AVX512 |
Yes, GCC and AOCC on Linux; Clang on Windows. |
N/A |
No |
AOCL-Compression |
AVX2 AVX512 instructions have not been used. But, library can be built with -mavx512f compiler option. |
Yes, GCC and AOCC on Linux; Clang on Windows. |
N/A |
Yes, using AOCL_ENABLE_ INSTRUCTION |
AOCL-RNG |
AVX2, AVX512 |
Yes |
Single, Double, Integer |
Yes, using AOCL_ENABLE_ INSTRUCTION |
AOCL-SecureRNG |
N/A |
N/A |
N/A |
N/A |
AOCL-ScaLAPACK |
Dependent on the underlying BLAS and LAPACK libraries |
Dependent on the underlying BLAS and LAPACK libraries |
Single, Double, Complex, Double Complex |
Dependent on the underlying BLAS and LAPACK libraries |
AOCL-LibMem |
AVX2, AVX512 |
No |
N/A |
No |
AOCL-Utils |
N/A |
N/A |
N/A |
N/A |
AOCL-DA |
Dependent on the underlying BLAS library |
No |
Single, Double |
Dependent on the underlying BLAS library |
Library/Feature |
glibc Dependency |
Single-threaded |
Multi-threaded |
MPI |
---|---|---|---|---|
AOCL-BLAS |
Yes |
Yes |
Yes |
No |
AOCL-LAPACK |
Yes |
Yes |
Yes |
No |
AOCL-FFTW |
Yes |
Yes |
Yes |
Yes |
AOCL-LibM |
Yes |
Yes |
No |
No |
AOCL-Sparse |
Yes |
Yes |
Partial (only for selected APIs) |
No |
AOCL-Cryptography |
Yes |
Yes |
No |
No |
AOCL-Compression |
Yes |
Yes |
Partial (for LZ4, LZ4HC, Snappy, ZLIB, and ZSTD) |
No |
AOCL-RNG |
Yes |
Yes |
No |
No |
AOCL-SecureRNG |
No |
Yes |
No |
No |
AOCL-ScaLAPACK |
Yes |
Yes, dependent on the underlying BLAS and LAPACK libraries |
Yes, dependent on the underlying BLAS and LAPACK libraries |
Yes |
AOCL-LibMem |
Yes |
Yes |
No |
No |
AOCL-Utils |
Yes |
Yes |
No |
No |
AOCL-DA |
Yes |
Yes |
Yes |
No |
Dynamic Dispatch facilitates building a single binary compatible with all the AMD “Zen” architectures. At runtime, this feature enables optimizations specific to the detected AMD “Zen” architecture.
You can find the flags to enable/disable (the applicable features in AOCL Feature Support Matrix - 1 and AOCL Feature Support Matrix - 2) in the individual library sections.
Additionally, AMD provides Spack (https://spack.io/) recipes for installing AOCL-BLAS, AOCL-LAPACK, AOCL-ScaLAPACK, AOCL-LibM, AOCL-FFTW, AOCL-Sparse, AOCL-Compression, AOCL-Cryptography, AOCL-DA, and AOCL-Utils libraries.
For more information on the AOCL release and installers, refer the AMD Developer Central (https://www.amd.com/en/developer/aocl.html).
For any issues or queries on the libraries, send an email to toolchainsupport@amd.com.
To determine the underlying architecture of your AMD system, refer to Check AMD Server Processor Architecture.