10.5. Performance Tuning - 5.2 English - 57404

AOCL User Guide (57404)

Document ID
57404
Release Date
2025-12-29
Version
5.2 English

Use the following tuning guidelines to get the best performance out of FFTZ:

  • Build with the -DENABLE_MULTI_THREADING=ON CMake option to enable multi-threading support using OpenMP when the application allows the use of multi-threading by the library.

  • Setting the OpenMP variables OMP_PROC_BIND and OMP_PLACES to TRUE and cores respectively, may improve performance for multi-threaded runs. This can be done as shown below:

export OMP_PROC_BIND=TRUE
export OMP_PLACES=cores
  • Linux specific optimizations:
    • Enable the frequency boost feature on AMD processors to improve performance. This can be done by writing 1 to the /sys/devices/system/cpu/cpufreq/boost file as shown below:

    # you may need superuser privileges to do this
    echo 1 > /sys/devices/system/cpu/cpufreq/boost
    
    • Disabling the SMT (Simultaneous Multi-Threading) feature on AMD processors may improve performance for some workloads. This can be done by writing off to the /sys/devices/system/cpu/smt/control file as shown below:

    # you may need superuser privileges to do this
    echo off > /sys/devices/system/cpu/smt/control
    
    • Set the CPU governor to performance to prevent the CPU frequency from scaling down during execution. This can be done by writing performance to the scaling_governor file for each CPU core as shown below:

    # you may need superuser privileges to do this
    echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
    • Enable transparent-huge-pages to improve memory access performance. This can be done by writing always to the enabled, defrag and shmem_enabled files in the /sys/kernel/mm/transparent_hugepage/ directory as shown below:

    # you may need superuser privileges to do this
    echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled
    echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag
    echo 'always' > /sys/kernel/mm/transparent_hugepage/shmem_enabled
    
    • Using custom memory allocators:

    • jemalloc:
      • For best performance, replace the default memory allocator as provided by libc with a high-performance memory allocator such as jemalloc.

        As it is known to boost the performance of FFTZ in most cases, it is highly recommended that you use jemalloc, specifically its default dev branch, for performance-critical applications.

        This can be done by preloading the jemalloc shared library using the LD_PRELOAD environment variable as shown below:

      export LD_PRELOAD=/path/to/libjemalloc.so
      
      • Additionally, the jemalloc memory allocator can be tuned using the MALLOC_CONF environment variable. These parameters configured for MALLOC_CONF are recommended considering jemalloc installed from the dev branch :

      export MALLOC_CONF="background_thread:true,         \
                          metadata_thp:auto,              \
                          oversize_threshold:90388608,    \
                          hpa_slab_max_alloc:4800929,     \
                          dss:disabled,                   \
                          hpa:true,                       \
                          huge_arena_pac_thp:true,        \
                          stats_print:false,              \
                          dirty_decay_ms:40000,           \
                          muzzy_decay_ms:40000"
      
    • amdalloc:
      • When compiling with AOCC (AMD’s LLVM based CPU compiler), it is recommended to use amdalloc as the memory allocator. amdalloc can be found in the AOCC installation under its lib folder.

        Use LD_PRELOAD to use it at runtime :

      export LD_PRELOAD=/path/to/libamdalloc.so
      
      • Additionally, set the MALLOC_CONF environment variable as below for the best performance with amdalloc

      export MALLOC_CONF="retain:true"