Performance Setup - 2021.2 English

Versal ACAP System Integration and Validation Methodology Guide (UG1388)

Document ID
UG1388
Release Date
2021-11-19
Version
2021.2 English

To maximize performance when using CPM, you must consider the following setup:

Master AXI4 Ports
Because there are two AXI4 MM ports, you must balance the packets accordingly and maximize bus utilization at both ports.
  • When to use:
    • Calculate your aggregated PCIe link throughput: This is PCIe Link Speed * PCIe Link Width.
    • Calculate your AXI4 MM port throughput on one of the port: This is 128-bit * CPMTOPSWCLK frequency (CPM GUI selection. Speed grade dependent, consult your device/silicon datasheet).
    • If PCIe link throughput is greater than AXI4 MM port throughput, you must use both ports.
      Note: Take into account the following considerations vs. design complexity in using both ports if your bandwidth is nearly equal.
      • PCIe link has some TLP overhead typically ~20-25% depending on packet sizes, Max Payload Size, and Max Read Request Size settings. Unaligned address transfers and/or scattered host memory might also affect this number due to inefficient DMA transfers.
      • NoC has some overhead typically ~6% on the Write side due to metadata insertion but nearly optimum on the Read side.
      • If using DDR memory, there might be additional overhead depending on the traffic pattern and the DDR bank/column/row settings.
  • How to use:
    • Packets must not split into both ports. They must operate independently as much as possible to avoid Head of Line blocking due to AXI4 ID and PCIe tags ordering.
      QDMA
      Split your traffic based on Queue ID. Allocate some queues to use the first AXI4-MM0 and the rest on the second AXI4-MM1.
      XDMA
      Traffic will be split automatically based on DMA channel ID. Even DMA channels route to AXI4-MM0 and odd DMA channels route to AXI4-MM1.
      AXI4 Bridge
      Only use one port AXI4-MM0. Therefore performance is expected to max out at the AXI4 MM port throughput only, and might not be up to the PCIe link throughput capability.
Slave AXI4 MM Port
Because there is only one AXI4 MM port, performance through this port is expected to max out at AXI4 MM port throughput only and might not be up to the PCIe link throughput capability.
Master and Slave AXI4-ST Ports
Because CPM can only use AXI4-ST ports in this mode directly to PL, therefore users are only required to operate their design at the same frequency and data bus width as the AXI4-ST interface from the CPM or PCIe PL IP.

PL PCIe can only use AXI4-ST ports, therefore users are only required to operate their design at the same frequency and data bus width as the AXI4-ST interface from the CPM or PCIe PL IP.