Usage - 57300

ZenDNN User Guide (57300)

Document ID
57300
Release Date
2025-08-18
Revision
5.1 English

No code changes are required. Once installed, simply run your vLLM inference workload as usual. The plugin will be automatically detected and used for inference on supported x86 CPUs that meet the required ISA features. While optimized for AMD EPYC™ CPUs, it may also function on other compatible x86 processors.

# Example: Standard vLLM inference code
from vllm import LLM, SamplingParams

llm = LLM(model="microsoft/phi-2")
params = SamplingParams(temperature=0.0, top_p=0.95)
output = llm.generate(["Hello, world!"], sampling_params=params)
print(output)

zentorch plugin will accelerate attention if installed and running on supported x86 CPUs (best performance on AMD EPYC™ CPUs).

Recommendation

For optimal performance with vLLM CPU inference, set the temperature parameter to 0.0 and use supported x86 CPUs (with best results on the latest AMD EPYC™ CPUs). Also, if NUMA is enabled in the hardware platform, it’s recommended to use the best performant NPS setting.

Note: These hardware recommendations are specific to vLLM CPU workloads. ZenTorch can be used independently and may have different requirements or optimizations for other use cases.

Support and Feedback

For questions, feedback, or to contribute, visit the AMD ZenDNN PyTorch Plugin GitHub page.