Lower Than Expected Bandwidth - 2023.1 English

Versal Adaptive SoC System Integration and Validation Methodology Guide (UG1388)

Document ID
UG1388
Release Date
2023-05-24
Version
2023.1 English
  • Check the Connectivity tab. For best performance, use all four NSU ports of the DDRMC.
  • Check the bandwidth values,the traffic class, and the following settings in the NoC QoS tab.
    Note: Bandwidth values return to default values if you change the Connectivity tab.
    • Best Effort: Default setting and the lowest priority. Use for general purpose masters.
    • Low Latency: High priority read-only setting. This setting has priority over Best Effort in NPS/DDRMC and is only recommended for used APU cache refills. Too much use can decrease system performance.
    • Isochronous: High priority setting. This setting has priority over Best Effort and Low Latency with a timer.
  • Check the master behavior.
    • Masters that issue short AXI bursts to random access generally have lower bandwidth.
    • Masters that request more bandwidth than allocated can cause other masters to have lower than expected bandwidth.
    • A single NMU cannot saturate the bandwidth of a DDRMC. Increase the number of masters issuing requests to increase bandwidth from the DDR.
    • Make sure you know the data width for the master (e.g., IP integrator might inherit the wrong data width), and maximize the data width if needed.
    • A short burst size must match the NoC packet.
  • Check the slave behavior.
    • The most common slave for the NoC is the integrated DDRMC. For information, see the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
    • Verify that the bus width and AXI clock frequencies match between the ingress master to the NoC and the egress slave connection.
    • Check for large average latency stackup. Measure this at both the master and the slave.
      • Start with the slave. If there is already excessive latency, the issue is with the slave.
      • If the slave and master are within 5% and both have an average latency greater than 1000 clks, the issue is likely within the NoC, and the NoC might need to be constrained further.
    • Perform a secondary analysis.
      • Inspect traffic through the NoC and look for switch contention, such as virtual channel (VC) assignment to QoS traffic classes.
      • Run integrated logic analyzer (ILA) on slave AXI4-Stream interfaces, and compute bus efficiency.
      • Redesign with AXI performance monitor (APM) on slave and master AXI4-Stream interfaces. Set up and capture extended metrics: bus efficiency, dead cycles, and slave-ready delays.
    • The DDRMC slaves contain information about activates, bus turn-arounds, etc. For information, see Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
Tip: You can measure NoC performance using the AMD open source ChipScoPy API. For more information, see the GitHub repository at http://www.github.com/Xilinx/chipscopy.