The following environment variable settings are optimal settings for zentf, and should be used in addition to the environment variable settings.
- Use LLVM OPENMP for all models except Recommendation System (Recsys)
models
- For Recsys models, use
GOMP_CPU_AFFINITY
- For Recsys models, use
- Use KMP settings for all models except Recsys models
-
export KMP_BLOCKTIME=1 -
export KMP_TPAUSE=0 -
export KMP_FORKJOIN_BARRIER_PATTERN=dist,dist -
export KMP_PLAIN_BARRIER_PATTERN=dist,dist -
export KMP_REDUCTION_BARRIER_PATTERN=dist,dist -
export KMP_AFFINITY=granularity=fine,compact,1,0
-
- Use Jemalloc as memory allocator
-
TF_NUM_INTEROP_THREADS=1(for CNN, Hugging Face NLP and LLM Models) -
TF_NUM_INTRAOP_THREADS=128(for CNN, Hugging Face NLP and LLM Models on Turin machine) -
OMP_PROC_BIND=FALSE -
USE_ZENDNN_MATMUL_DIRECT=1 -
ZENDNNL_MATMUL_WEIGHT_CACHE=1 -
ZENDNNL_MATMUL_ALGO
NLP & LLM models
- For FP32 and direct BF16 models
- Throughput:
export ZENDNNL_MATMUL_ALGO=2 - Latency:
export ZENDNNL_MATMUL_ALGO=1
- Throughput:
- For BF16 (AMP) models:
export ZENDNNL_MATMUL_ALGO=4
DIEN (Recsys) models
- For FP32 and direct BF16 models:
export ZENDNNL_MATMUL_ALGO=1 - For BF16 (AMP) models:
export ZENDNNL_MATMUL_ALGO=4
CNN models
- For FP32 and direct BF16 models:
export ZENDNNL_MATMUL_ALGO=1 - For BF16 (AMP) models:
export ZENDNNL_MATMUL_ALGO=4