To launch the vitis_analyzer
to view
the profiling information in the XRT flow, use the following command.
vitis_analyzer xrt.run_summary
To launch the vitis_analyzer
to view
the profiling information in the XSDB flow, use the following command.
vitis_analyzer aie_trace_profile.run_summary
Example of heat_map Core Metrics and conflicts Memory Metrics
The following image shows the design's active time, stall time,
cumulative instruction count, and vector_instruction_count as part of heat_map
metric and memory conflict time, as well as
cumulative memory error time of conflicts
metrics
for ten tiles of an example design.
Consider the AI Engine located in (15,0). During the active utilization time (5.120 ms) it performs 5120000 vector instructions which represents 87% of the active time. This is an excellent performance that indicates a well optimized core.
Example of stalls Core Metrics and dma_locks Memory Metrics
The following image shows the design's memory stall time, stream
stall time, cascade stall time, and lock stall time as part of stalls
metrics and cumulative DMA activity time, as
well as cumulative DMA locks count of dma_locks
metrics for ten tiles of an example design.
On the core (24,2), the DMA has been active for 70.645 ms (77.8 millions instructions), but has been stalled 298 times.
Example of execution Core Metrics and conflicts Memory Metrics
The following image shows the design's cumulative instruction count, vector
instruction count, load instruction count, and store instruction count as part of
execution
metrics and memory conflict time, as well as
cumulative memory error time of conflicts
metrics
for ten tiles of an example design.
Although they are minor, core (15,1) suffers from some memory conflicts that must be identified. The occurrence being very small might be due to some DMA or some other kernel access interference.
Example of read_throughputs and write_throughputs AI Engine Metrics and dma_stalls_s2mm and dma_stalls_mm2s AI Engine Memory Metrics
The following image shows the design's stream and cascade read and write
instruction count as part of read_throughputs
and
write_throughputs
metrics and s2mm and mm2s
channel0 and channel1 stalls time of dma_stalls_s2mm
and dma_stalls_mm2s
metrics for ten tiles of an example design.