3. AOCL-LAPACK#
AOCL-LAPACK provides several build and run-time options to get
performance uplift for different use cases.
Multi-threading, selection of x86 ISA and enabling AOCL specific
optimizations are some of the prominent options. In general, setting the
flag ENABLE_AMD_FLAGS to ON during CMAKE configure turns on many
optimization and interface options. Following sub-sections describe
those different options and effect of the options on the performance of
AOCL-LAPACK. All these options can be enabled / disabled either during
configuring before build or while executing applications using
corresponding environment variables.
3.1. Enable AMD Optimizations#
All performance optimizations and other library features added by AMD
are enabled by setting either ENABLE_AMD_FLAGS or
ENABLE_AMD_AOCC_FLAGS to ON for GCC or AOCC compiler respectively.
Following are the salient features turned on with these options:
AOCL-LAPACK performance optimizations for Zen family of CPUs
Parallelization using OpenMP for Shared Memory Parallelization
Usage of Extended BLAS API(s) available in AOCL-BLAS
3.2. Enable / Disable Multithreading#
AOCL-LAPACK supports multi-threading using OpenMP in selected APIs.
This feature is enabled by default when AOCL-LAPACK is compiled with
ENABLE_AMD_FLAGS=ON or ENABLE_AMD_AOCC_FLAGS=ON. However, you
can disable multi-threading by setting ENABLE_MULTITHREADING=NO.
Select LAPACK interface APIs that support multi-threading
automatically choose optimal number of threads. However, you can
explicitly set the number of threads through the environment variable
OMP_NUM_THREADS or OpenMP runtime APIs. In such a scenario, number
of threads is selected as follows:
Thread Criteria |
Threads Used by API |
|---|---|
If user specified threads are greater than AOCL-LAPACK computed optimal threads |
AOCL-LAPACK computed optimal threads |
If user specified threads are less than AOCL-LAPACK computed optimal threads |
user specified threads |
3.3. Build-Time ISA Selection#
To provide good default performance across different architectures,
default compiler flags are set to -mtune=native -mavx2 -mfma -O3 when
compiled with ENABLE_AMD_FLAGS or ENABLE_AMD_AOCC_FLAGS options. This
means, AOCL-LAPACK requires minimum AVX2 and Fused Multiply Accumulate
(FMA) support from the target CPU.
However, the library can be compiled with different ISA flag, such as AVX512 depending on the ISA supported on the target CPU. You can use the following steps:
Set the flag, LF_ISA_CONFIG to the desired ISA support. The available
options are Auto, AVX2 (default), AVX512, and None. The command to use
this is as follows:
$ cmake .. -DLF_ISA_CONFIG=AVX512 -DENABLE_AMD_FLAGS=ON
3.4. Run-Time ISA Selection#
For select functions, AOCL-LAPACK supports automatic processor
dispatching to suitable code paths based on the target CPU ISA
architecture. However, you can enable different ISA code path using
environment variable, AOCL_ENABLE_INSTRUCTIONS. Valid values for
AOCL_ENABLE_INSTRUCTIONS are SSE2, AVX, AVX2, AVX512 and GENERIC. All
values are case-insensitive.
When you set AOCL_ENABLE_INSTRUCTIONS to ISA value higher than supported
by target CPU, AOCL-LAPACK chooses the code path that is best supported
architecture on that target CPU. If you choose a lower level ISA, then
same will be used. Any ISA selection lower than AVX2 defaults to generic
reference code path.
Case 1: On a AVX2-only (example: AMD Zen1 / Zen2 / Zen3) machine
Setting
AOCL_ENABLE_INSTRUCTIONS=AVX2will take avx2 path.Setting
AOCL_ENABLE_INSTRUCTIONS=AVX512will take avx2 pathSetting
AOCL_ENABLE_INSTRUCTIONS=genericorsse2oravxwill take reference path.
Case 2: On AVX512 (example: Zen4 / Zen5) machine
Setting
AOCL_ENABLE_INSTRUCTIONS=AVX512will take avx512 pathSetting
AOCL_ENABLE_INSTRUCTIONS=AVX2will take avx2 pathSetting
AOCL_ENABLE_INSTRUCTIONS=genericorsse2oravxwill run reference path.
Case 3: Setting AOCL_ENABLE_INSTRUCTIONS to values other than
avx512, avx2, avx, sse2, generic will result in error
Please note that enabling AMD flags for compilation sets the minimum ISA
requirement to be AVX2 (see previous section). Hence in these cases SSE2
and AVX options for AOCL_ENABLE_INSTRUCTIONS are not meaningful.
They are applicable if the library was built without -mavx2 option.
3.5. Using AOCL-BLAS#
AOCL-LAPACK can be linked with any Netlib BLAS compliant library when compiled with standard CMake options as provided in AOCL User Guide. However, AOCL-LAPACK provides an option explicitly to link explicitly with AOCL-BLAS library at compile time. This option enables invoking lower level AOCL-BLAS APIs directly and that could result in better performance for certain APIs on AMD “Zen” CPUs. To force AOCL-LAPACK to use AOCL-BLAS library, provide the option ENABLE_AOCL_BLAS in the CMake configuration:
$ cmake -DENABLE_AMD_AOCC_FLAGS=ON -DENABLE_AOCL_BLAS=ON ...
Enabling this option requires providing path to AOCL-BLAS library. It can be done by one of the following methods:
Set
AOCL_ROOTenvironment variable to the path where AOCL-BLAS library ($AOCL_ROOT/lib) and header files ($AOCL_ROOT/include) are located:$ export AOCL_ROOT=<path to AOCL-BLAS>
Specify root path of the AOCL-BLAS library through the CMake option
AOCL_ROOT:$ cmake -DENABLE_AMD_AOCC_FLAGS=ON -DENABLE_AOCL_BLAS=ON -DAOCL_ROOT=<path to AOCL-BLAS> ...
The path specified in AOCL_ROOT must have the directories include
and lib containing the necessary header files and binary of
AOCL-BLAS respectively.
3.6. Using Extended BLAS APIs#
As mentioned earlier, usage of Extended BLAS APIs is enabled by
setting ENABLE_AMD_FLAGS or ENABLE_AMD_AOCC_FLAGS to ON. If
there is a need to disable this feature, CMake option of
ENABLE_BLAS_EXT_GEMMT can be used:
$ cmake -DENABLE_AMD_FLAGS=ON -DENABLE_BLAS_EXT_GEMMT=OFF ...