The Dynamic Dispatch feature supports AMD “Zen”, AMD “Zen2”, AMD “Zen3”, AMD “Zen4”, and AMD “Zen5” architectures in a single binary. However, it also includes a generic architecture to support older x86-64 processors. The generic architecture uses a pure C implementation of the APIs and does not use any architecture-specific features.
The specific compiler flags used for building the library with generic configuration are:
-O2 -funsafe-math-optimizations -ffp-contract=fast -Wall \
-Wno-unused-function -Wfatal-errors
Note
As no architecture specific optimization and vectorized kernels are enabled, performance with the generic architecture may be significantly lower than the architecture-specific implementation.
Previous AOCL-BLAS releases identified the processor based on Family,
Model, and other cpuid features, and selected the appropriate code
path based on the preprogrammed choices. With Dynamic Dispatch, an
unknown processor would fall through to the slow generic code path,
although users could override this by setting an environment variable
BLIS_ARCH_TYPE to a suitable value.
From AOCL-BLAS 4.2, additional cpuid tests based on AVX2 and AVX512 instruction support are used to enable AMD “Zen3”, AMD “Zen4” or AMD “Zen5” code paths to be selected by default on suitable x86-64 processors (i.e. future AMD processors and current or future Intel processors). These AMD Zen code paths are not (re-)optimized specifically for these different architectures but should perform better than the slow generic code path.
To be more specific:
AVX2 support requires AVX2 and FMA3.
AVX512 support requires AVX512 F, DQ, CD, BW, and VL.