The NoC master unit (NMU) is the ingress point to the NoC. The NMU provides:
- Asynchronous clock domain crossing and rate matching between the AXI master and the NoC.
- Conversion from/to AXI protocol to NoC Packet Protocol (NPP).
- Address matching and route control.
- WRAP burst support for 32, 64, and 128-bit interfaces.
- INCR and FIXED burst support.
- Read re-tagging to allow out of order service and prevent interconnect blocking.
- Write order enforcement.
- Ingress QoS control.
- Handling of the AXI exclusive access feature.
- Support for configurable data width from 32 to 512-bit on AXI interfaces and 128 to 512-bit on AXI4-Stream interfaces. Parameter propagation from the connected IP sets the AXI data width.
- AXI4 and AXI4-Stream support.
- Acceptance of up to 64 AXI reads and 64 AXI writes.
- The maximum size of an NPP write is 256 bytes. An AXI write that is more than 256 bytes can span multiple NPP writes.
- Support for up to 64 outstanding NPP reads of 32 bytes each. The read re-order buffer (RROB) holds 64 32-byte entries. An AXI read that is more than 32 bytes consumes multiple entries. The RROB in HBM_NMU holds 64 64-byte entries.
- 512B write buffer
- DDR controller interleaving support at 128B – 4 KB interleave granularity.
- Each NMU can access a maximum of seven DDR controllers, or eight if you enable Controller Interleaving.
- Programmable virtual channel mapping.
- The NMU is available in two variants:
- Full functionality
- All the above specifications apply, the NMU is used on the programmable logic.
- Latency-optimized
- Fixed 128-bit wide AXI interface and all transactions
are address route based.Note: Integrated blocks (CIPS and AI Engine) use latency-optimized NMU/NSU blocks while the PL uses full functionality blocks.
The NMU resides on the transaction initiator side of the system. It includes a standard AXI4 interface with optional sideband signals that provide additional addressing and routing controls. The previous figure shows an asynchronous data-crossing block and a rate-matching block. These blocks form the interface between the NoC and the application (AXI) side of the master unit. The Rate Matching buffers write data from the slow application domain until there is enough payload to prevent write bubbles.
Data packetizing occurs when AXI requests enter the NMU clock domain. The packetizing process breaks read and write transactions into smaller transfers (this process is called chopping). Chopping always occurs on chop-size aligned boundaries. Two parameters affect chopping: chop size (fixed at 256 bytes), and memory interleave size (when two DDR controllers interleave). Channel Interleaving in a memory controller does not affect chopping.
If memory interleave granularity is smaller than 256 bytes, packetizing chops reads and writes into transfers equal to the interleave granularity. Non-interleaved transactions, or transactions with interleave granularity greater than or equal to 256 bytes are chopped into 256-byte transfers. The chopped transactions are 256B address aligned - not 256 byte chunks.
For example, a 1K transfer starting at 0x0 comprises four transactions with the following chops: 0-255, 256-511, 512-767, 768-1023. However, a 1K transaction starting at 128 is split as five transactions: 128-255, 256-511, 512-767, 768-1023, and 1024-1151. This is to align with the 256B address boundary rule. Each chopped transaction divides into NoC packets called "flits." Each flit can carry up to 16 bytes of data in addition to various header information.
In parallel with the packetizing process, address lookup determines the destination ID.
In read processes, the system re-tags each read packet and assigns an ordering ID based on available slots in the Read Reorder Buffer (RROB). The RROB maintains a linked list of per-AXI-ID assigned tags. This ensures responses return in the correct order.
The final stage, before injecting a packet into the NoC switch fabric, is to perform access QoS control.
Read responses are placed in the RROB. In accordance with AXI ordering rules, logic at the output of the buffer selects read responses to return to the requesting AXI master. This logic relies on the linked list structure from the request path to determine the correct response ordering.