The NoC master unit (NMU) is the ingress point to the NoC. The NMU provides:
- Asynchronous clock domain crossing and rate matching between the AXI master and the NoC.
- Conversion from/to AXI protocol to NoC Packet Protocol (NPP).
- Address matching and route control.
- WRAP burst support for 32, 64, and 128-bit interfaces.
- INCR and FIXED burst support.
- Read re-tagging to allow out of order service and prevent interconnect blocking.
- Write order enforcement.
- Ingress QoS control.
- Handling of the AXI exclusive access feature.
- Support for configurable data width from 32 to 512-bit on AXI interfaces and 128 to 512-bit on AXI4-Stream interfaces. AXI data width is configured via parameter propagation from the connected IP.
- AXI4, and AXI4-Stream support.
- Acceptance of up to 64 AXI reads and 64 AXI writes.
- The maximum size of an NPP write is 256 bytes. An AXI write that is more than 256 bytes can span multiple NPP writes.
- Support for up to 64 outstanding NPP reads of 32 bytes each. The read re-order buffer (RROB) holds 64 32-byte entries. An AXI read that is more than 32 bytes consumes multiple entries. The RROB in HBM_NMU holds 64 64-byte entries.
- 512B write buffer
- DDR controller interleaving support at 128B – 4 KB interleave granularity.
- Programmable virtual channel mapping.
- The NMU is available in two variants:
- Full functionality
- All the above specifications apply, the NMU is used on the programmable logic.
- Latency-optimized
- Fixed 128-bit wide AXI interface and all transactions
are address route based.Note: Integrated blocks (CIPS and AI Engine) use latency-optimized NMU/NSU blocks while the PL uses full functionality blocks.
The NMU is located at the transaction initiator side of the system. It is equipped with a standard AXI4 interface which includes some optional sideband signals providing additional addressing and routing controls. As shown in the previous figure, an asynchronous data crossing and rate matching block form the interface between the NoC and application (AXI) side of the master unit. The Rate Matching buffers write data from the slow application domain until there is enough payload to prevent write bubbles.
Data packetizing is performed when AXI requests enter the NMU clock domain. As part of the packetizing process, read and write transactions are broken into smaller transfers (this process is called chopping). Chopping is always performed on chop-size aligned boundaries. Two parameters affect chopping: chop size (fixed at 256 bytes) and memory interleave size (when two DDR controllers are interleaved). Channel Interleaving in a memory controller does not affect chopping. If memory interleave granularity is smaller than 256 bytes, reads and writes are chopped into transfers equal to the interleave granularity. Non-interleaved transactions, or transactions with interleave granularity greater than or equal to 256 bytes are chopped into 256-byte transfers. The chopped transactions are 256B address aligned and not just 256 byte chunks. For example, a 1K transfer starting at 0x0 would be split into 4 transactions with the following chops: 0-255, 256-511, 512-767, 768-1023. However, a 1K transaction starting at 128 would be split as five transactions: 128-255, 256-511, 512-767, 768-1023, and 1024-1151 to align with the 256B address boundary rule. Each chopped transaction is divided into NoC packets, or "flits." Each flit can carry up to 16 bytes of data in addition to various header information.
In parallel with the packetizing process, address lookup is performed to determine the destination ID.
In read processes, read re-tagging is performed on each read packet and an ordering ID is assigned based on available slots in the Read Reorder Buffer (RROB). The RROB maintains a linked list of per-AXI-ID assigned tags, allowing responses to be returned in the correct order.
The final stage before a packet is injected into the NoC switch fabric is to perform access QoS control.
Read responses are placed in the RROB. In accordance with AXI ordering rules, logic at the output of the buffer selects read responses to return to the requesting AXI master. This logic relies on the linked list structure from the request path to determine the correct response ordering.