Enable ZenDNN in WeGO PyTorch
wego_mod = wego_torch.compile(mod, wego_torch.CompileOptions(
...
optimize_options = wego_torch.OptimizeOptions(zendnn_enable = True))
)
After ZenDNN is enabled, the CPU operators (the operators not supported by DPU) in the compiled WeGO graph are replaced with the ZenDNN operators, and they will be executed using ZenDNN kernels for acceleration.
Environment Variables
Name | Description |
---|---|
OMP_DYNAMIC | Set it explicitly with FALSE when you want to enable ZenDNN. |
ZENDNN_GEMM_ALGO | The default value is 3. You can set [0, 1, 2, 3, 4] to tune different GEMM ALGO paths. |
OMP_NUM_THREADS |
The default value is the number of physical cores of the user system. You need to tune per the inference thread number to achieve better performance. For more details, see tuning guidelines. |
Tunning Guidelines
ZenDNN uses OpenMP as the underlying library. The OMP_NUM_THREADS
environment variable controls intra-op parallelism,
which is multi-core parallelism in ZenDNN kernels. For OpenMP, different application
threads or inter-op threads can use different OpenMP thread pools for intra-op
tasks. Thus, many OpenMP threads might be used in a multi-thread application, which
will consume lots of CPU core resources and reduce the overall performance. So, the
recommended tuning OMP_NUM_THREADS
value is set
per the number of cores in the target CPU platform and the thread number used in
your application to avoid over-subscription. For example, if you launch 16 threads
in an application and have 64 CPU cores on your platform, you can set OMP_NUM_THREADS <= 4
to avoid CPU cores
contention.