To launch the vitis_analyzer
to view
the profiling information in the XRT flow, use the following command.
vitis_analyzer xrt.run_summary
To launch the vitis_analyzer
to view
the profiling information in the XSDB flow, use the following command.
vitis_analyzer aie_trace_profile.run_summary
Example of heat_map Core Metrics and conflicts Memory Metrics
The following image shows the design's active time, stall time,
cumulative instruction count, and vector_instruction_count as part of heat_map
metric and memory conflict time, as well as
cumulative memory error time of conflicts
metrics
for ten tiles of an example design.
Consider the AI Engine located in (24,2). The stall time (.043 ms) is 20% of the active time (.214 ms). During this active time, it performs 179200 vector instructions, which represents 95% of the active time. This is an excellent performance that indicates a well optimized core.
Example of stalls Core Metrics and dma_locks Memory Metrics
The following image shows the design's memory stall time, stream
stall time, cascade stall time, and lock stall time as part of stalls
metrics and cumulative DMA activity time, as
well as cumulative DMA locks count of dma_locks
metrics for ten tiles of an example design.
On the core (24,2), the DMA has been active for 70.645 ms (77.8 millions instructions), but has been stalled during 298 times. This does not indicate stalls in 298 instructions, because a stall can last multiple clock cycles.
Example of execution Core Metrics and conflicts Memory Metrics
The following image shows the design's cumulative instruction count, vector
instruction count, load instruction count, and store instruction count as part of
execution
metrics and memory conflict time, as well as
cumulative memory error time of conflicts
metrics
for ten tiles of an example design.
Although they are minor, core (24,2) suffers from some memory conflicts that must be identified. The occurrence being very small might be due to some DMA or some other kernel access interference.
Example of stream_put_get Core Metrics and dma_stalls_s2mm Memory Metrics
The following image shows the design's stream read instruction count,
cascade read instruction count, and cascade write instruction count as part of
stream_put_get
metrics and s2mm channel0 stalls
time, as well as s2mm channel1 stalls time of dma_stalls_s2mm
metrics for ten tiles of an example design.
The graph shows that the core (25,1) writes to the cascade stream 3% of the time. (24,1) is the reading for the same amount of time from this cascade stream.
Example of heat_map Core Metrics and dma_locks Memory Metrics
The following image shows the design's active time, stall time,
cumulative instruction count and vector_instruction_count as part of heat_map
metrics and cumulative DMA activity time, as
well as cumulative DMA locks count of dma_lock
metrics for ten tiles of an example design.
The cumulative DMA Activity time jointly with the Cumulative DMA Locks count allows you to see if there is any discrepancy between lock acquisition number and the number of data transferred through the DMAs. The relative number of locks count can also be used to interpret the relative number of iterations of each core.
Example of input_bandwidths Interface Metrics
The following image shows the design's input bandwidth at the PLIO
level as part of input_bandwidths:0
metric in a 8 x
8 cascaded tiles design.
In this graph, the channel 0 bandwidth for all input PLIOs is approximately 95% which is close to the achievable maximum. After this profiling step, verify that the AI Engines are not starving for data.
Report Consolidation in Vitis Analyzer
During the profiling stage, not all metrics can be used at the same time
during runtime. You can run the design in hardware multiple times by rebooting the
board, each run using different profile metric sets in xrt.ini
. Typically, for AI Engine interface bandwidth profiling, a single channel
(the same for all PLIOs) can be profiled during runtime. Multiple channel profiling
will necessitate multiple runs.
The vitis_analyzer
has the ability to
consolidate multiple reports concerning different runs of the same design. That
enables you to display the bandwidth of multiple interface channels, for example.
While vitis_analyzer
is run with the xrt.run_summary
of a specific run of the design, other
xrt.run_summary
reports can be opened by
clicking the + toolbar button in the main toolbar and a
window toolbar, as shown below.
After consolidating the profiling data for input PLIOs channels 0 and 4, and
output PLIOs channel 0, vitis_analyzer
can display
the following table: