vLLM-zentorch Plugin - 57300

ZenDNN User Guide (57300)

Document ID
57300
Release Date
2025-08-18
Revision
5.1 English

The vLLM-zentorch plugin brings together zentorch and vLLM to deliver efficient, plug-and-play large language model (LLM) inference on modern x86 CPU servers. By leveraging ZenDNN's highly optimized kernels, this plugin accelerates both attention and non-attention operations in vLLM, providing significant throughput improvements for popular LLMs.

zentorch is designed for acceleration of PyTorch workloads on CPUs, offering drop-in, high-performance implementations of key deep learning operations. When used with vLLM, zentorch automatically replaces default attention mechanisms and other compute-intensive kernels with ZenDNN-optimized versions—no code changes required. While optimized for AMD EPYC™ CPUs, the plugin supports any x86 CPU with the required ISA features.

Key Features

  • Plug-and-Play Acceleration: No code modifications required—just install zentorch alongside vLLM for automatic acceleration.
  • Seamless vLLM Integration: vLLM detects zentorch and transparently uses ZenDNN-optimized attention and non-attention kernels for supported CPUs.
  • Optimized for Modern x86 CPU servers: Delivers best-in-class performance on AMD EPYC™ processors, while supporting a broad range of x86 CPUs with the necessary instruction set.
  • Powered by ZenDNN: Leverages AMD's ZenDNN library for state-of-the-art, CPU-optimized neural network operations.

Compatibility

  • vLLM: v0.9.0 or later (explicitly tested; earlier versions may not be supported)