The goal of this stage is to profile the design and determine which domain (AI Engine, PL, NoC) is causing a throughput drop, which causes the design to stall.
The following figure shows the tasks and techniques available in this stage.
The section below lists the technique available in this stage.
Profiling AI Engine Core, Interface and Memory Module
You can profile the AI Engine Core, Interface, and Memory modules in XRT or XSDB flows. It is a non-intrusive feature which can be enabled at runtime using the XRT.ini file or running scripts in XSDB. The feature uses performance counters available in the AI Engine array to gather profile data. The amount and type of data gathered is limited by the number of performance counters available.
- Profiling AI Engine Core
- The profile metric sets available for profiling the AI Engine are as
follows:
- heat map
- stalls
- stream puts/gets
- exceptions
- tile execution
- read/write bandwidth related metrics
- Memory Module Profiling
- The profile metric sets available for profiling the memory module are as
follows:
- conflicts
- DMA locks
- DMA stalls
Some examples of AI Engine and Memory Module profiling information displayed in Vitis Analyzer can be found in Figure 3 and Figure 4.
- Interface Bandwidth Profiling
- Profile metrics to collect interface bandwidth information are also
available. Depending on the direction of the port and type of stall (i.e.,
idle, stalled), you can identify if the PL is stalling and impacting
throughput of the AI Engine or vice versa.
In the following table, the metrics used for interface profiling are indicated in the first column:
Table 1. Interface Profiling Metrics: input_bandwidths and input_stalls_idle Metric set: input_stalls_idle Stalls High Idle High Metric set: input_bandwidths Low bandwidth
AI Engine does not consume samples at the right rate. Proceed to stage 4.
PL Kernel does not produce samples at the right rate. Proceed to stage 3.
Table 2. Interface Profiling Metrics: output_bandwidths and output_stalls_idle Metric set: output_stalls_idle Stalls High Idle High Metric set: output_bandwidths Low bandwidth
PL Kernel does not consume samples at the right rate. Proceed to stage 3.
AI Engine does not produce samples at the right rate. Proceed to stage 4.
You can run the design multiple times, rebooting the board in between each run, with different parameters in the file xrt.ini. Vitis Analyzer allows you to consolidate the different xrt.run.summary files reports so that you have a global view on the various bandwidths, stalls and idles at the interface level.
For details on how to enable profiling in hardware and interpreting the results, see Profiling the AI Engine.
The profile results allow you to quickly identify the exact AI Engine, input stream or output stream involved in the design performance drop.
Next Stage:
- Proceed to stage 3 if you determine that a PL kernel is causing the performance drop. In stage 3, you can identify the exact PL kernel(s) with the sub-par performance.
- Proceed to stage 4 if you determine that an AI Engine kernel is causing the throughput drop.