The following environment variable settings are optimal settings for zentorch, and should be used in addition to the environment variable settings.
- CNN-based models
- FP32 models
- ZENDNNL_MATMUL_ALGO=1
- BF16 (AMP) models
- ZENDNNL_MATMUL_ALGO=1
- FP32 models
- NLP-based models
- FP32 models and BF16 (AMP) models Latency
- ZENDNNL_MATMUL_ALGO=1
- FP32 models and BF16 (AMP) models Throughput
- ZENDNNL_MATMUL_ALGO=5
- FP32 models and BF16 (AMP) models Latency
- LLM-based models
- BF16 and WOQ (Per channel and Per group) models
- ZENDNNL_MATMUL_ALGO=1
- BF16 and WOQ (Per channel and Per group) models
- For RecSys models
- FP32, BF16, INT8 and BF16 (AMP) models
- ZENDNNL_MATMUL_ALGO=1
- FP32, BF16, INT8 and BF16 (AMP) models