Known Limitations - Known Limitations - 57300

ZenDNN User Guide (57300)

Document ID
57300
Release Date
2026-04-13
Revision
5.2.1 English

The following is a list of known limitations.

  • The zentorch library requires the g++ compiler as a dependency. Ensure that the installed g++ version matches the system's gcc version.
  • ALGO 3 has higher memory footprint during execution with AMP precision for a few models: Starcoder2-15b, Starcoder2-7b, and Qwen-QwQ-32B.
  • We recommend using:
    • ZENDNNL_MATMUL_ALGO=1 for CNN AMP & NLP AMP latency
    • ZENDNNL_MATMUL_ALGO=5 for NLP AMP Throughput (mixed precision) models
  • For best performance, we recommend running models with freezing enabled. This can be done by setting the environment variable: export TORCHINDUCTOR_FREEZING=1
  • When running models with vLLM 0.18.0 and export TORCHINDUCTOR_FREEZING=1, set
    export VLLM_USE_AOT_COMPILE=0
    export TORCHINDUCTOR_AUTOGRAD_CACHE=0
  • If you encounter an error while installing the sentencepiece package with Python 3.13, install it using:
    conda install -c conda-forge sentencepiece