8.5.2. Multi-Threading - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

AOCL-DLP supports multiple threading models that can be configured at build time and controlled at runtime.

Threading Model Configuration:

AOCL-DLP supports two threading models that can be configured at build time:

  • none (default): Single-threaded execution, no parallel processing

  • openmp: OpenMP-based threading for parallel GEMM operations

# Configure with OpenMP threading (recommended for performance)
$ cmake -B build -DDLP_THREADING_MODEL=openmp

# Configure single-threaded (default)
$ cmake -B build -DDLP_THREADING_MODEL=none

Threading Model Details:

  • Setting DLP_THREADING_MODEL=openmp automatically enables OpenMP support

  • For custom OpenMP installations, use -DDLP_OPENMP_ROOT=/path/to/openmp

  • The DLP_ENABLE_OPENMP option provides additional control over OpenMP

Thread Control Precedence:

AOCL-DLP follows a specific order of precedence when determining the number of threads to use for GEMM operations:

  1. API calls - dlp_thread_set_num_threads() or dlp_thread_set_ways()

  2. DLP_NUM_THREADS - Library-specific environment variable

  3. OpenMP API - omp_set_num_threads() (when OpenMP is enabled)

  4. OMP_NUM_THREADS - OpenMP environment variable

  5. System default - Number of available CPU cores

Performance Recommendations:

For Multi-Socket Systems:

  • Use OMP_PROC_BIND=close with numactl --cpunodebind to bind to specific NUMA nodes

  • Set OMP_PLACES=cores for fine-grained thread control

  • Use numactl --interleave for memory interleaving across NUMA nodes

  • Set OMP_WAIT_POLICY=active for optimal performance

  • Consider binding to the second socket for better performance (machine dependent)

For Single-Socket Systems:

  • Use OMP_PROC_BIND=close to keep threads near each other

  • Set OMP_PLACES=cores for better cache locality

  • Set OMP_WAIT_POLICY=active for reduced thread wake-up overhead

For Memory-Bound Workloads:

  • Consider using fewer threads than physical cores

  • Set OMP_WAIT_POLICY=active for consistent thread responsiveness

  • Ensure proper memory alignment and layout

For Compute-Bound Workloads:

  • Use all available cores

  • Set OMP_WAIT_POLICY=active to reduce thread wake-up overhead

  • Consider hyperthreading benefits for your specific workload