3. AOCL-LAPACK#
AOCL-LAPACK provides several build and run-time options to get
performance uplift for different use cases.
Multi-threading, selection of x86 ISA and enabling AOCL specific
optimizations are some of the prominent options. In general, setting the
flag ENABLE_AMD_FLAGS
to ON during CMAKE configure turns on many
optimization and interface options. Following sub-sections describe
those different options and effect of the options on the performance of
AOCL-LAPACK. All these options can be enabled / disabled either during
configuring before build or while executing applications using
corresponding environment variables.
3.1. Enable AMD Optimizations#
All performance optimizations and other library features added by AOCL
are enabled by setting either ENABLE_AMD_FLAGS
or
ENABLE_AMD_AOCC_FLAGS
to ON for GCC or AOCC compiler respectively.
Following are the salient features turned on with these options:
AOCL performance optimizations for Zen family of CPUs
Parallelization using OpenMP for Shared Memory Parallelization
Usage of Extended BLAS API(s) available in AOCL-BLAS
3.2. Enable / Disable Multithreading#
AOCL-LAPACK supports multi-threading using OpenMP in selected APIs.
This feature is enabled by default when AOCL-LAPACK is compiled with
ENABLE_AMD_FLAGS=ON
or ENABLE_AMD_AOCC_FLAGS=ON
. However, you
can disable multi-threading by setting ENABLE_MULTITHREADING=NO
.
Select LAPACK interface APIs that support multi-threading automatically choose optimal number of threads. However, you can explicitly set the number of threads through the environment variable or OpenMP runtime APIs. In such a scenario, the number of threads is selected as follows:
Thread Criteria |
Threads Used by API |
---|---|
If user specified threads are greater than AOCL-LAPACK computed optimal threads |
AOCL-LAPACK computed optimal threads |
If user specified threads are less than AOCL-LAPACK computed optimal threads |
User specified threads |
3.3. Build-Time ISA Selection#
To support binary portability across different architectures, the
default compiler flags are set to -mtune=native -mavx2 -mfma -O3
when
compiled with ENABLE_AMD_FLAGS
or ENABLE_AMD_AOCC_FLAGS
options. This
means, AOCL-LAPACK requires minimum AVX2 and Fused Multiply Accumulate
(FMA) support from the target CPU.
However, the library can be compiled with different ISA flag, such as AVX512 depending on the ISA supported on the target CPU. You can use the following steps:
Set the flag, LF_ISA_CONFIG
to the desired ISA support. The available
options are Auto, AVX2 (default), AVX512, and None. The command to use
this is as follows:
$ cmake .. -DLF_ISA_CONFIG=AVX512 -DENABLE_AMD_FLAGS=ON
3.4. Run-Time ISA Selection#
For select functions, AOCL-LAPACK supports automatic processor
dispatching to suitable code paths based on the target CPU ISA
architecture. However, you can enable different ISA code path using
environment variable, AOCL_ENABLE_INSTRUCTIONS
. Valid values for
AOCL_ENABLE_INSTRUCTIONS
are SSE2, AVX, AVX2, AVX512 and GENERIC. All
values are case-insensitive.
When you set AOCL_ENABLE_INSTRUCTIONS
to ISA value higher than supported
by target CPU, AOCL-LAPACK chooses the code path that is best supported
architecture on that target CPU. If you choose a lower level ISA, then
same will be used. Any ISA selection lower than AVX2 defaults to generic
reference code path.
Case 1: On a AVX2-only (example: AMD Zen1 / Zen2 / Zen3) machine
Setting
AOCL_ENABLE_INSTRUCTIONS=AVX2
will take avx2 path.Setting
AOCL_ENABLE_INSTRUCTIONS=AVX512
will take avx2 pathSetting
AOCL_ENABLE_INSTRUCTIONS=generic
orsse2
oravx
will take reference path.
Case 2: On AVX512 (example: Zen4 / Zen5) machine
Setting
AOCL_ENABLE_INSTRUCTIONS=AVX512
will take avx512 pathSetting
AOCL_ENABLE_INSTRUCTIONS=AVX2
will take avx2 pathSetting
AOCL_ENABLE_INSTRUCTIONS=generic
orsse2
oravx
will run reference path.
Case 3: Setting AOCL_ENABLE_INSTRUCTIONS
to values other than
avx512, avx2, avx, sse2, generic will result in error
Performance varies based on the function and size of the inputs.
3.5. Using AOCL-BLAS#
AOCL-LAPACK can be linked with any Netlib BLAS compliant library when compiled with standard CMake options as provided in AOCL User Guide. However, AOCL-LAPACK provides an option explicitly to link explicitly with AOCL-BLAS library at compile time. This option enables invoking lower level AOCL-BLAS APIs directly and that could result in better performance for certain APIs on AMD “Zen” CPUs. To force AOCL-LAPACK to use AOCL-BLAS library, provide the option ENABLE_AOCL_BLAS in the CMake configuration:
$ cmake -DENABLE_AMD_AOCC_FLAGS=ON -DENABLE_AOCL_BLAS=ON ...
Provide path of the AOCL-BLAS library using one of the following methods:
Set
AOCL_ROOT
environment variable to the root path where AOCL-BLAS library ($AOCL_ROOT/lib
) and header files ($AOCL_ROOT/include
) are located:$ export AOCL_ROOT=<path to AOCL-BLAS>
Specify root path of the AOCL-BLAS library through the CMake option
AOCL_ROOT
:$ cmake -DENABLE_AMD_AOCC_FLAGS=ON -DENABLE_AOCL_BLAS=ON -DAOCL_ROOT=<path to AOCL-BLAS> ...
The path specified in AOCL_ROOT
must have the directories include
and lib
containing the necessary header files and binary of
AOCL-BLAS respectively.
3.6. Using Extended BLAS APIs#
As mentioned earlier, usage of Extended BLAS APIs is enabled by
setting ENABLE_AMD_FLAGS
or ENABLE_AMD_AOCC_FLAGS
to ON
. If
there is a need to disable this feature, CMake option of
ENABLE_BLAS_EXT_GEMMT
can be used:
$ cmake -DENABLE_AMD_FLAGS=ON -DENABLE_BLAS_EXT_GEMMT=OFF ...