Use the following tuning guidelines to get the best performance out of FFTZ:
Build with the
-DENABLE_MULTI_THREADING=ONCMake option to enable multi-threading support using OpenMP when the application allows the use of multi-threading by the library.Setting the OpenMP variables
OMP_PROC_BINDandOMP_PLACEStoTRUEandcoresrespectively, may improve performance for multi-threaded runs. This can be done as shown below:
export OMP_PROC_BIND=TRUE
export OMP_PLACES=cores
- Linux specific optimizations:
Enable the frequency boost feature on AMD processors to improve performance. This can be done by writing
1to the/sys/devices/system/cpu/cpufreq/boostfile as shown below:
# you may need superuser privileges to do this echo 1 > /sys/devices/system/cpu/cpufreq/boost
Disabling the SMT (Simultaneous Multi-Threading) feature on AMD processors may improve performance for some workloads. This can be done by writing
offto the/sys/devices/system/cpu/smt/controlfile as shown below:
# you may need superuser privileges to do this echo off > /sys/devices/system/cpu/smt/control
Set the CPU governor to
performanceto prevent the CPU frequency from scaling down during execution. This can be done by writingperformanceto thescaling_governorfile for each CPU core as shown below:
# you may need superuser privileges to do this echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Enable transparent-huge-pages to improve memory access performance. This can be done by writing
alwaysto theenabled,defragandshmem_enabledfiles in the/sys/kernel/mm/transparent_hugepage/directory as shown below:
# you may need superuser privileges to do this echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag echo 'always' > /sys/kernel/mm/transparent_hugepage/shmem_enabled
Using custom memory allocators:
- jemalloc:
For best performance, replace the default memory allocator as provided by libc with a high-performance memory allocator such as jemalloc.
As it is known to boost the performance of FFTZ in most cases, it is highly recommended that you use jemalloc, specifically its default dev branch, for performance-critical applications.
This can be done by preloading the jemalloc shared library using the
LD_PRELOADenvironment variable as shown below:
export LD_PRELOAD=/path/to/libjemalloc.so
Additionally, the jemalloc memory allocator can be tuned using the
MALLOC_CONFenvironment variable. These parameters configured forMALLOC_CONFare recommended considering jemalloc installed from the dev branch :
export MALLOC_CONF="background_thread:true, \ metadata_thp:auto, \ oversize_threshold:90388608, \ hpa_slab_max_alloc:4800929, \ dss:disabled, \ hpa:true, \ huge_arena_pac_thp:true, \ stats_print:false, \ dirty_decay_ms:40000, \ muzzy_decay_ms:40000"
- amdalloc:
When compiling with AOCC (AMD’s LLVM based CPU compiler), it is recommended to use
amdallocas the memory allocator. amdalloc can be found in the AOCC installation under its lib folder.Use LD_PRELOAD to use it at runtime :
export LD_PRELOAD=/path/to/libamdalloc.so
Additionally, set the
MALLOC_CONFenvironment variable as below for the best performance with amdalloc
export MALLOC_CONF="retain:true"