Optimal Environment Variable Settings for zentf - Optimal Environment Variable Settings for zentf - 57300

ZenDNN User Guide (57300)

Document ID
57300
Release Date
2026-04-13
Revision
5.2.1 English

The following environment variable settings are optimal settings for zentf, and should be used in addition to the environment variable settings.

  • Use LLVM OPENMP for all models except Recommendation System (Recsys) models
    • For Recsys models, use GOMP_CPU_AFFINITY
  • Use KMP settings for all models except Recsys models
    • export KMP_BLOCKTIME=1
    • export KMP_TPAUSE=0
    • export KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
    • export KMP_PLAIN_BARRIER_PATTERN=dist,dist
    • export KMP_REDUCTION_BARRIER_PATTERN=dist,dist
    • export KMP_AFFINITY=granularity=fine,compact,1,0
  • Use Jemalloc as memory allocator
  • TF_NUM_INTEROP_THREADS=1 (for CNN, Hugging Face NLP and LLM Models)
  • TF_NUM_INTRAOP_THREADS=128 (for CNN, Hugging Face NLP and LLM Models on Turin machine)
  • OMP_PROC_BIND=FALSE
  • USE_ZENDNN_MATMUL_DIRECT=1
  • ZENDNNL_MATMUL_WEIGHT_CACHE=1
  • ZENDNNL_MATMUL_ALGO

NLP & LLM models

  • For FP32 and direct BF16 models
    • Throughput:export ZENDNNL_MATMUL_ALGO=2
    • Latency: export ZENDNNL_MATMUL_ALGO=1
  • For BF16 (AMP) models: export ZENDNNL_MATMUL_ALGO=4

DIEN (Recsys) models

  • For FP32 and direct BF16 models: export ZENDNNL_MATMUL_ALGO=1
  • For BF16 (AMP) models: export ZENDNNL_MATMUL_ALGO=4

CNN models

  • For FP32 and direct BF16 models: export ZENDNNL_MATMUL_ALGO=1
  • For BF16 (AMP) models: export ZENDNNL_MATMUL_ALGO=4