Analyzing PL Kernel Performance in Simulation

Analyzing PL Kernel Performance in Simulation - 2023.2 English

Versal Adaptive SoC System Integration and Validation Methodology Guide (UG1388)

Document ID

UG1388

Release Date

2023-11-15

Version

2023.2 English

HLS

All kernels developed by HLS can be optimized using compiler directives and HLS pragmas. The Vitis HLS compiler generates detailed reports containing Fmax, resource utilization, and performance information. In addition to the summary reports, the schedule viewer provides a visual representation of how the design is built and how the operations are scheduled. You can use this view to help identify suboptimal portions of the synthesized design.

You can supplement these compile-time reports by running the HLS cosimulation flow. When you run this flow, Vitis HLS automatically extracts performance data from the simulation results and reports additional performance information such as minimum, maximum, and average running times for FIFO high watermarks. AMD recommends using all of these analysis capabilities before integrating the HLS kernel in the system.

Note: A kernel that does not meet performance in a standalone context will not meet performance in the complete system.

Many factors influence the performance of an HLS kernel, including interface properties, loop-level parallelism, task-level parallelism, and more. In particular, understanding the concepts of initiation interval (II) and dataflow are essential to achieve good results. Initiation interval is measured in clock cycles and indicates how often a particular loop or process restarts. For example, if a loop is successfully synthesized with II=1, then in the resulting RTL a new loop iteration starts every cycle. II is closely related to throughput, a key performance metric. Dataflow is a performance optimization that takes advantage of task-level parallelism. Whenever possible, dataflow allows different sub-processes in the design to run concurrently instead of sequentially. Achieving optimal results with dataflow requires a suitable code structure. For more information about initiation interval, dataflow, and other HLS performance optimizations, see the Vitis High-Level Synthesis User Guide (UG1399).

RTL

Simulate RTL kernels using the same methods used for standard RTL simulation with simulation tools such as the Vivado simulator, not the Vitis environment. All standard RTL simulation best practices apply to RTL kernels. Check the RTL kernel for functional correctness as well as determine the best outcome with regard to performance. Make sure each RTL kernels meets performance goals prior to adding the RTL kernel to the larger system.

All RTL kernels you develop must be simulated at the block level, either using custom RTL test benches or using the AMD LogiCORE™ AXI Verification IP (VIP) provided in the AMD Vivado™ IP Catalog. For more information, see the AXI Verification IP LogiCORE IP Product Guide (PG267).

Tip: Additional performance counters can be written in RTL to count cycles in the PL and calculate latency and throughput to/from the AI Engines.