vLLM-zentorch Plugin - vLLM-zentorch Plugin - 57300

ZenDNN User Guide (57300)

Document ID
57300
Release Date
2026-04-13
Revision
5.2.1 English

The zentorch vLLM plugin integrates zentorch with vLLM's V1 engine to deliver optimized large language model inference on AMD EPYC™ CPUs. By leveraging ZenDNN's highly optimized kernels, this plugin accelerates both attention and non-attention operations in vLLM, providing significant throughput improvements for popular LLMs.

The plugin uses vLLM's platform and general plugin entry points to:

  • Inject zentorch optimization passes into torch.compile
  • Disable replacement with Intel oneDNN kernels to enable replacement with zentorch kernels
  • Enable CPU-only profiling

Key Features

  • Plug-and-Play Acceleration: No code modifications required—just install zentorch alongside vLLM for automatic acceleration.
  • Seamless vLLM Integration: vLLM detects zentorch and transparently uses ZenDNN-optimized GEMM and Embedding kernels for supported CPUs.
  • Optimized for Modern x86 CPUs: Delivers best-in-class performance on AMD EPYC™ processors, while supporting a broad range of x86 CPUs with the necessary instruction set.
  • Powered by ZenDNN: Leverages AMD's ZenDNN library for state-of-the-art, CPU-optimized neural network operations.