Performance modeling in Vivado IP integrator can be decomposed in the two following stages:
- NoC/DDR memory interface dataflow modeling
- Accelerator block dataflow modeling
In the first stage, you model the NoC dataflow using AMD traffic generator IP, which can be configured to generate transactions that resemble accelerator dataflow. For example, if the accelerator requires a three-dimensional data cube to be processed and if the data is arranged in a linear format in DDR memory, the fetch address from the NoC master is not linear. You can configure the traffic generators for the required addressing mode (e.g., three-dimensional) and monitor performance in the NoC-PL interface.
If the throughput is lower than expected, which is usually caused by DRAM efficiency, you can tune the DRAM address map to make the DRAM interface more efficient. In addition, the NoC provides quality of service (QoS) options. Depending on the traffic class, you can tune the QoS values for each NoC master and slave to meet application needs. For example, you can configure video applications that require the lowest level of latency for low latency traffic and configure other NoC masters for best effort traffic.
In the next stage, you model the accelerator block by generating traffic for the accelerator that mimics actual dataflow from the NoC/DDR memory. If the accelerator interface supports the AXI4 streaming protocol (e.g., AI Engine blocks), you can use AMD traffic generators or simulation PLIOs to model the traffic and tune the performance. You can tune the NoC and accelerator configuration based on the performance reported by monitor blocks.
For more information, see this link in the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313) and see the NoC DDR Memory Controller Versal Device Architecture Tutorials.