When both vLLM and the zentorch packages are installed, vLLM automatically
detects the zentorch platform and uses zentorch optimizations via torch.compile.
Figure 1. vLLM V1 Engine
The plugin leverages AMD EPYC™ specific intrinsics and optimizations to
accelerate computations on AMD EPYC™ CPUs. However, it may also function on other x86
CPUs that meet the required ISA. We use zentorch to compile the LLM with torch.compile, replacing the native ops with zentorch's
optimized ops.
Key Components
The system consists of two main components that work together to enable CPU optimization.
- The
ZenCPUPlatformextends vLLM'sCpuPlatformclass and configures the system for CPU. It establishes the compilation configuration usingCompilationLevel.DYNAMO_ONCEorCompilationMode.DYNAMO_TRACE_ONCEwith theinductorbackend, and integrateszentorchoperators through thezentorch._compile_backend.optimize_passinjection. - The Plugin Entry Points (defined in
init.py) handle the initialization process by registering with vLLM through thevllm.platform_pluginsandvllm.general_pluginsmechanisms. These entry points apply necessary patches before model initialization occurs and ensure compatibility by validating the vLLM version requirements.