numactl
numactl provides options to run processes with specific scheduling and
memory placement policy. It can restrict the memory binding and process scheduling to
specific CPUs or NUMA nodes.
-
cpunodebind=nodes: Restricts the process to a specific group of nodes. -
physcpubind=cpus: Restricts the process to a specific set of physical CPUs. -
membind=nodes: Allocates the memory from the nodes listed. The allocation fails if there is not enough memory on the listed nodes. -
interleave=nodes: Memory will be allocated in a round robin manner across the specified nodes. When the memory cannot be allocated on the current target node, it will fall back to the other nodes.
Example
If
<model_run_script> is the application
that needs to run on the server, then it can be triggered using numactl
settings as
follows:numactl --cpunodebind=0-3 -interleave=0-3 python <model_run_script>
The interleave option of
numactl works only when the number nodes allocated for a particular
application is more than one. cpunodebind and physcpubind behave the same way for ZenDNN stack, whereas
interleave memory allocation performs better than
membind.
The number of concurrent executions can be increased beyond 4 nodes. The following
formula can be used to decide the number of concurrent executions to be triggered at a
time:
Number Concurrent Executions = Number of Cores Per Socket / Numbers of Cores sharing
L3 cache
This can also be extended to even cores. However, you must verify these details empirically.