Non-uniform Memory Access - 57300

ZenDNN User Guide (57300)

Document ID
57300
Release Date
2025-08-18
Revision
5.1 English

numactl

numactl provides options to run processes with specific scheduling and memory placement policy. It can restrict the memory binding and process scheduling to specific CPUs or NUMA nodes.

  • cpunodebind=nodes: Restricts the process to a specific group of nodes.
  • physcpubind=cpus: Restricts the process to a specific set of physical CPUs.
  • membind=nodes: Allocates the memory from the nodes listed. The allocation fails if there is not enough memory on the listed nodes.
  • interleave=nodes: Memory will be allocated in a round robin manner across the specified nodes. When the memory cannot be allocated on the current target node, it will fall back to the other nodes.

Example

If <model_run_script> is the application that needs to run on the server, then it can be triggered using numactl settings as follows:
numactl --cpunodebind=0-3 -interleave=0-3 python <model_run_script>

The interleave option of numactl works only when the number nodes allocated for a particular application is more than one. cpunodebind and physcpubind behave the same way for ZenDNN stack, whereas interleave memory allocation performs better than membind.

The number of concurrent executions can be increased beyond 4 nodes. The following formula can be used to decide the number of concurrent executions to be triggered at a time:
Number Concurrent Executions = Number of Cores Per Socket / Numbers of Cores sharing
L3 cache

This can also be extended to even cores. However, you must verify these details empirically.