Hardware configuration, OS, Kernel, and BIOS settings play an important role in performance. Details of the environment variables used on a 5th Gen AMD EPYC™ server to get the best performance numbers are enumerated in the following sections.
Recommendation
For optimal performance with vLLM CPU inference, set the temperature parameter to 0.0 and use supported x86 CPUs (with best results on the latest AMD EPYC™ CPUs).