Architecture - Architecture - 57300

ZenDNN User Guide (57300)

Document ID
57300
Release Date
2026-04-13
Revision
5.2.1 English

When both vLLM and the zentorch packages are installed, vLLM automatically detects the zentorch platform and uses zentorch optimizations via torch.compile.

Figure 1. vLLM V1 Engine

The plugin leverages AMD EPYC™ specific intrinsics and optimizations to accelerate computations on AMD EPYC™ CPUs. However, it may also function on other x86 CPUs that meet the required ISA. We use zentorch to compile the LLM with torch.compile, replacing the native ops with zentorch's optimized ops.

Key Components

The system consists of two main components that work together to enable CPU optimization.

  • The ZenCPUPlatform extends vLLM's CpuPlatform class and configures the system for CPU. It establishes the compilation configuration using CompilationLevel.DYNAMO_ONCE or CompilationMode.DYNAMO_TRACE_ONCE with the inductor backend, and integrates zentorch operators through the zentorch._compile_backend.optimize_pass injection.
  • The Plugin Entry Points (defined in init.py) handle the initialization process by registering with vLLM through the vllm.platform_plugins and vllm.general_plugins mechanisms. These entry points apply necessary patches before model initialization occurs and ensure compatibility by validating the vLLM version requirements.