Multi-threaded builds of AOCL-BLAS provide several mechanisms for setting the desired number of threads during initialization and runtime. An explanation follows.
Runtime Thread Control
AOCL-BLAS libraries that are multi-threaded using OpenMP parallelism
provide two mechanisms for the users to control the number of threads
for AOCL-BLAS functions to use. These are the normal OpenMP
mechanisms and AOCL-BLAS specific environment variables and function
calls. The AOCL-BLAS specific mechanisms include the option to set
the overall number of threads for AOCL-BLAS to use or to set the
threading specifically for the different loops within the AOCL-BLAS Level 3
routines (for example, DGEMM). These are termed automatic and
manual ways, respectively. For more information, refer to:
Multithreading.md
The order of precedence used in AOCL-BLAS is as follows:
Manual way values set using
bli_thread_set_ways()by the application.Valid value(s) of any of the
BLIS_*_NTenvironment variables.Value set using
bli_thread_set_num_threads(nt)by the application.Valid value set for the environment variable
BLIS_NUM_THREADS.omp_set_num_threads(nt)issued by the application.Valid value set for the environment variable
OMP_NUM_THREADS.The default number of threads used by the chosen OpenMP runtime library when
OMP_NUM_THREADSis not set.
Two other factors may override these settings:
OpenMP parallelism at higher level(s) in the code calling AOCL-BLAS, that is, the number of active levels and the level at which the AOCL-BLAS call occurs, i.e. despite any BLIS-specific settings, the AOCL-BLAS call will be serial if the OpenMP level it is called at is not active.
The effect of AOCL Dynamic (if enabled), as described in the next section.
Previously, if bli_thread_set_ways() or bli_thread_set_num_threads()
were used, it was not possible to revert to the original settings (from
program startup) or to use omp_set_num_threads() to alter threading
settings for subsequent AOCL-BLAS calls. To resolve this, in AOCL 5.2, the
API bli_thread_reset() was added. It reverts the internal threading data
to that which existed when the program was launched, subject to (where appropriate)
any changes in the OpenMP ICVs (e.g. via omp_set_num_threads()). If the
environment variable BLIS_NUM_THREADS was used, this will NOT be cleared,
as the initial state of the program is restored.
Note
From AOCL 4.1, support for calling AOCL-BLAS within nested OpenMP parallelism has been improved. Hence, using the standard OpenMP mechanisms should be sufficient for most of the use cases.
Note
Specifying a greater number of threads than the number of cores may result in deteriorated performance because of over-subscription of cores.
Usage Examples
The following tables describes sample scenarios for setting the number of threads during AOCL-BLAS initialization for respective codes:
int main()
{
// pseudo code to use OpenMP API to set number of threads
omp_set_num_threads(16);
dgemm_( );
// ...
return 0;
}
Sample Command Executed |
No of Threads Set During AOCL-BLAS Initialization |
Remarks |
|---|---|---|
|
8 |
|
|
16 |
|
|
16 |
|
|
8 |
|
int main()
{
// pseudo code
dgemm_( );
// ...
return 0;
}
Sample Command Executed |
No of Threads Set During AOCL-BLAS Initialization |
Remarks |
|---|---|---|
|
8 |
|
|
64 |
|
|
4 |
|
Once the number of threads is set during AOCL-BLAS initialization, it
will be used in subsequent BLAS routine executions until the application
modifies the number of threads to be used (for example, using the
omp_set_num_threads() API).
The following table describes the sample scenarios for setting the number of threads during runtime:
int main()
{
// Pseudo code for sample usage of OpenMP API to set
// number of threads in the Application during runtime
do {
if (m < 500)
omp_set_num_threads(8);
if (m >= 500)
omp_set_num_threads(16);
if (m >= 3000)
omp_set_num_threads(32);
dgemm_( );
} while(test_case_counter--)
// ...
return 0;
}
Sample Command Executed |
m |
Number of Threads for this BLAS Call |
Remarks |
|---|---|---|---|
|
100 |
8 |
Application issued |
500 |
16 |
Application issued |
|
200 |
8 |
Application re-issued |
|
4000 |
32 |
Application issued |
|
1000 |
16 |
Application re-issued |
|
500 |
16 |
Application re-issued |
|
100 |
8 |
Application re-issued |
The following program demonstrates use of bli_thread_reset().
int main()
{
// pseudo code to demonstrate use of bli_thread_reset()
omp_set_num_threads(16);
dgemm_( ); // Uses 16 threads
bli_thread_set_num_threads(4);
dgemm_( ); // Changed to use 4 threads
omp_set_num_threads(11);
dgemm_( ); // Will still use 4 threads as BLIS-specific thread
// setting takes precedence over OpenMP setting
bli_thread_reset();
dgemm_( ); // After reset, will now use 11 threads, as set
// previously by omp_set_num_threads()
return 0;
}