4.4.1. AOCL-BLAS Thread Control - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

Multi-threaded builds of AOCL-BLAS provide several mechanisms for setting the desired number of threads during initialization and runtime. An explanation follows.

Runtime Thread Control

AOCL-BLAS libraries that are multi-threaded using OpenMP parallelism provide two mechanisms for the users to control the number of threads for AOCL-BLAS functions to use. These are the normal OpenMP mechanisms and AOCL-BLAS specific environment variables and function calls. The AOCL-BLAS specific mechanisms include the option to set the overall number of threads for AOCL-BLAS to use or to set the threading specifically for the different loops within the AOCL-BLAS Level 3 routines (for example, DGEMM). These are termed automatic and manual ways, respectively. For more information, refer to: Multithreading.md

The order of precedence used in AOCL-BLAS is as follows:

  1. Manual way values set using bli_thread_set_ways() by the application.

  2. Valid value(s) of any of the BLIS_*_NT environment variables.

  3. Value set using bli_thread_set_num_threads(nt) by the application.

  4. Valid value set for the environment variable BLIS_NUM_THREADS.

  5. omp_set_num_threads(nt) issued by the application.

  6. Valid value set for the environment variable OMP_NUM_THREADS.

  7. The default number of threads used by the chosen OpenMP runtime library when OMP_NUM_THREADS is not set.

Two other factors may override these settings:

  1. OpenMP parallelism at higher level(s) in the code calling AOCL-BLAS, that is, the number of active levels and the level at which the AOCL-BLAS call occurs, i.e. despite any BLIS-specific settings, the AOCL-BLAS call will be serial if the OpenMP level it is called at is not active.

  2. The effect of AOCL Dynamic (if enabled), as described in the next section.

Previously, if bli_thread_set_ways() or bli_thread_set_num_threads() were used, it was not possible to revert to the original settings (from program startup) or to use omp_set_num_threads() to alter threading settings for subsequent AOCL-BLAS calls. To resolve this, in AOCL 5.2, the API bli_thread_reset() was added. It reverts the internal threading data to that which existed when the program was launched, subject to (where appropriate) any changes in the OpenMP ICVs (e.g. via omp_set_num_threads()). If the environment variable BLIS_NUM_THREADS was used, this will NOT be cleared, as the initial state of the program is restored.

Note

From AOCL 4.1, support for calling AOCL-BLAS within nested OpenMP parallelism has been improved. Hence, using the standard OpenMP mechanisms should be sufficient for most of the use cases.

Note

Specifying a greater number of threads than the number of cores may result in deteriorated performance because of over-subscription of cores.

Usage Examples

The following tables describes sample scenarios for setting the number of threads during AOCL-BLAS initialization for respective codes:

int main()
{
   // pseudo code to use OpenMP API to set number of threads

   omp_set_num_threads(16);
   dgemm_( );
   // ...
   return 0;
}

Sample Command Executed

No of Threads Set During AOCL-BLAS Initialization

Remarks

$ BLIS_NUM_THREADS=8 ./my_blis_program

8

BLIS_NUM_THREADS will have the maximum precedence.

$ ./my_blis_program

16

BLIS_NUM_THREADS is not set and hence, omp_set_num_threads(16) has taken effect.

$ OMP_NUM_THREADS=4 ./my_blis_program

16

BLIS_NUM_THREADS is not set, omp_set_num_threads(16) has taken effect as it has higher precedence than OMP_NUM_THREADS.

$ BLIS_NUM_THREADS=8 OMP_NUM_THREADS=4 ./my_blis_program

8

BLIS_NUM_THREADS is set to 8, omp_set_num_threads(nt) and OMP_NUM_THREADS do not have any effect.

int main()
{
   // pseudo code

   dgemm_( );
   // ...
   return 0;
}

Sample Command Executed

No of Threads Set During AOCL-BLAS Initialization

Remarks

$ BLIS_NUM_THREADS=8 ./my_blis_program

8

BLIS_NUM_THREADS will have the maximum precedence.

$ ./my_blis_program

64

BLIS_NUM_THREADS is not set, omp_set_num_threads() is not issued, and OMP_NUM_THREADS is not set, Considering the number of logical cores to be 64, number of threads is 64. Or the number of cores derived from numactl --physcpubind=<...> option.

$ OMP_NUM_THREADS=4 ./my_blis_program

4

BLIS_NUM_THREADS is not set, omp_set_num_threads() is not issued, and OMP_NUM_THREADS is set to 4.

Once the number of threads is set during AOCL-BLAS initialization, it will be used in subsequent BLAS routine executions until the application modifies the number of threads to be used (for example, using the omp_set_num_threads() API).

The following table describes the sample scenarios for setting the number of threads during runtime:

int main()
{
   // Pseudo code for sample usage of OpenMP API to set
   // number of threads in the Application during runtime

   do {
      if (m < 500)
         omp_set_num_threads(8);
      if (m >= 500)
         omp_set_num_threads(16);
      if (m >= 3000)
         omp_set_num_threads(32);

      dgemm_( );
   } while(test_case_counter--)
   // ...
   return 0;
}

Sample Command Executed

m

Number of Threads for this BLAS Call

Remarks

$ ./my_blis_program

100

8

Application issued omp_set_num_threads(8)

500

16

Application issued omp_set_num_threads(16)

200

8

Application re-issued omp_set_num_threads(8)

4000

32

Application issued omp_set_num_threads(32)

1000

16

Application re-issued omp_set_num_threads(16)

500

16

Application re-issued omp_set_num_threads(16)

100

8

Application re-issued omp_set_num_threads(8)

The following program demonstrates use of bli_thread_reset().

int main()
{
   // pseudo code to demonstrate use of bli_thread_reset()

   omp_set_num_threads(16);
   dgemm_( ); // Uses 16 threads

   bli_thread_set_num_threads(4);
   dgemm_( ); // Changed to use 4 threads

   omp_set_num_threads(11);
   dgemm_( ); // Will still use 4 threads as BLIS-specific thread
              // setting takes precedence over OpenMP setting

   bli_thread_reset();
   dgemm_( ); // After reset, will now use 11 threads, as set
              // previously by omp_set_num_threads()

   return 0;
}