4.4.4. Thread Count Considerations - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

AOCL supports multi-threaded execution for many of its math kernels. You can control the number of threads via environment variables such as OMP_NUM_THREADS or through library-specific APIs.

Specifying Prime Number of Threads

If you specify a prime number of threads (e.g., 13, 17, 19), AOCL internally reduces the thread count by 1 for values greater than 11. This is done to enable efficient 2D parallelization, which relies on factorizing the thread count for optimal workload distribution.

Using a prime number can lead to:

  • Reduced thread utilization (e.g., 13 becomes 12 internally)

  • Potential performance degradation due to uneven workload partitioning

Workaround

To retain the prime thread count explicitly, you can set the following environment variable:

export BLIS_JC_NT=<prime_number>

However, note that performance may not be optimal with this workaround, as internal parallelization strategies may not be fully effective with prime thread counts.

Recommendation

For best performance, it is recommended to use thread counts that are:

  • Powers of two (e.g., 4, 8, 16)

  • Divisible by the number of physical cores

  • Composite numbers that allow better factorization