System Performance Closure - 2024.1 English

Versal Adaptive SoC System Integration and Validation Methodology Guide (UG1388)

Document ID
UG1388
Release Date
2024-06-19
Version
2024.1 English

AMD Versal™ devices are built around heterogeneous compute engines connected to each other by the NoC or the PL, and connected to the external system through high-performance transceivers and I/Os. During the system application planning and mapping phases, the device interfaces and overall compute requirements are used to specify the target performance of each compute and control function implemented in the device. Each function is designed and mapped to the best suitable or available hardware resource using the corresponding programming language and compilation software (for example, system software for the embedded processor system, C/C++ for AI Engines or PL kernels, RTL for higher performance PL kernels or firmware, etc.).

Individual design teams must validate both functionality and expected performance at the function level prior to integrating in a subset of the system application or the complete system. During the integration phase, functionality can break and performance can degrade. Due to the complexity and heterogeneous nature of the system applications supported by the AMD Versal™ devices, the analysis and debug methodology must be understood and planned ahead of time. The AMD Vitis™ and AMD Vivado™ tools are comprehensive and complementary design environments, which provide all necessary features to simulate functionality, report design characteristics, and measure or probe data in hardware.

Important: Application performance requirements are usually met by creating the proper connectivity architecture between key blocks, either PL-based or hard IP, with the right throughput and latency budgets for control and compute blocks, and the right Quality-of-Service (QoS) constraints for data movement between blocks and storage. Trying to achieve the best possible clock frequency for all PL blocks is usually not necessary to meet performance goals and can potentially increase power consumption by increasing the logic area with no relevant performance gain.

You can use AXI traffic generators as an to alternative to using file-based inputs and outputs. In contrast to file-based inputs and outputs that are static and limited, AXI traffic generators are a dynamic way to produce and consume data. For more information, see the AXIS External Traffic Generator Feature Tutorial.

The following sections recommend step-by-step analysis methods to identify application bottlenecks, identify performance mismatches around one or several functions, and address common performance issues based on the targeted device resource.