Lower Than Expected Bandwidth - 2022.2 English - UG1388

Versal ACAP System Integration and Validation Methodology Guide (UG1388)

Document ID
UG1388
Release Date
2022-11-16
Version
2022.2 English
  • Check the Connectivity tab. For best performance, use all four NSU ports of the DDRMC.
  • Check the bandwidth values,the traffic class, and the following settings in the NoC QoS tab.
    Note: Bandwidth values return to default values if you change the Connectivity tab.
    • Best Effort: Default setting and the lowest priority. Use for general purpose masters.
    • Low Latency: High priority read-only setting. This setting has priority over Best Effort in NPS/DDRMC and is only recommended for used APU cache refills. Too much use can decrease system performance.
    • Isochronous: High priority setting. This setting has priority over Best Effort and Low Latency with a timer.
  • Check the master behavior.
    • Masters that issue short AXI bursts to random access generally have lower bandwidth.
    • Masters that request more bandwidth than allocated can cause other masters to have lower than expected bandwidth.
    • A single NMU cannot saturate the bandwidth of a DDRMC. Increase the number of masters issuing requests to increase bandwidth from the DDR.
    • Make sure you know the data width for the master (e.g., IP integrator might inherit the wrong data width), and maximize the data width if needed.
    • A short burst size must match the NoC packet.
  • Check the slave behavior.
    • The most common slave for the NoC is the integrated DDRMC. For information, see the Versal ACAP Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
    • Verify that the bus width and AXI clock frequencies match between the ingress master to the NoC and the egress slave connection.
    • Check for large average latency stackup. Measure this at both the master and the slave.
      • Start with the slave. If there is already excessive latency, the issue is with the slave.
      • If the slave and master are within 5% and both have an average latency greater than 1000 clks, the issue is likely within the NoC, and the NoC might need to be constrained further.
    • Perform a secondary analysis.
      • Inspect traffic through the NoC and look for switch contention, such as virtual channel (VC) assignment to QoS traffic classes.
      • Run integrated logic analyzer (ILA) on slave AXI4-Stream interfaces, and compute bus efficiency.
      • Redesign with AXI performance monitor (APM) on slave and master AXI4-Stream interfaces. Set up and capture extended metrics: bus efficiency, dead cycles, and slave-ready delays.
    • The DDRMC slaves contain information about activates, bus turn-arounds, etc. For information, see Versal ACAP Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
Tip: You can measure NoC performance using the Xilinx open source ChipScoPy API. For more information, see the GitHub repository at http://www.github.com/Xilinx/chipscopy.