Usage - Usage - 57300

ZenDNN User Guide (57300)

Document ID
57300
Release Date
2026-04-13
Revision
5.2.1 English

No code changes are required. Once installed, simply run your vLLM inference workload as usual. The plugin will be automatically detected and used for inference on supported x86 CPUs that meet the required ISA features. While optimized for AMD EPYC™ CPUs, it may also function on other compatible x86 processors.

Note: Upon importing vLLM, you should see the following message in the logs:
INFO [__init__.py] Platform plugin zentorch is activated

Environment Configuration

The plugin is recommended to be run with ZENDNNL_MATMUL_ALGO=1 (the default).

Environment Variables

export VLLM_CPU_KVCACHE_SPACE=120         # GB for KV cache
export VLLM_CPU_OMP_THREADS_BIND=0-127    # CPU cores to use
export TORCHINDUCTOR_FREEZING=1 
export VLLM_USE_AOT_COMPILE=0
export TORCHINDUCTOR_AUTOGRAD_CACHE=0

Performance Libraries

Install and preload tcmalloc and llvm-openmp for best performance:
# tcmalloc
#The following command is for Ubuntu
sudo apt-get install libtcmalloc-minimal4
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4:$LD_PRELOAD

# llvm-openmp
conda install -c conda-forge llvm-openmp=18.1.8=hf5423f3_1 -y
export LD_PRELOAD="$CONDA_PREFIX/lib/libiomp5.so:$LD_PRELOAD"

Example

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-3.1-8B", dtype="bfloat16")
params = SamplingParams(temperature=0.8, top_p=0.95)
output = llm.generate(["Hello, world!"], sampling_params=params)
print(output)
Note: These hardware recommendations are specific to vLLM CPU workloads. ZenTorch can be used independently and may have different requirements or optimizations for other use cases.

Support and Feedback

For questions, feedback, or to contribute, visit the AMD ZenDNN PyTorch Plugin GitHub page.