The zentorch zip package which you can download from the AMD ZenDNN Developer Central page contains a convenient bash script to help you set optimal environment settings for best performance.
Before you run your workload, activate the conda environment where zentorch 5.1 is installed and source the zentorch_env_setup.sh file.
source scripts/zentorch_env_setup.sh --help
source scripts/zentorch_env_setup.sh --framework <zentorch|ipex> --model <llm|recsys|cnn|nlp> --threads <num_threads> --precision <amp|bf16|fp32|woq>
You can set the num_threads variable by
checking the output of the following shell command:
lscpu | awk '/^Core\(s\) per socket:/ {print $4}'
For example, if you are running your LLM workload in BF16 format on an AMD 5th Gen EPYC™ Processor (codenamed Turin) with 192 cores, you would source the zentorch_env_setup.sh as follows:
source scripts/zentorch_env_setup.sh --framework zentorch --model llm --threads 192 --precision bf16
The script will make sure that necessary utilities like llvm-openmp as
well as optimal tools for memory allocation (for example jemalloc) are installed and
made available to zentorch.
Consult the Performance Tuning chapter for more details on the various environment variables.