The goal of this stage is to determine the AI Engine kernel or graph construct causing design performance drop or stall, or causing a deadlock.
The following figure shows the tasks and techniques available in this stage.
The sections below list the different debug techniques available in this design stage.
Running and Analyzing Runtime Trace Data Using AI Engine Event Trace Flow
- Compile the design with event trace enabled and other event trace related options.
- Run the design in hardware and collect event trace data.
- Open the trace summary file in Vitisâ„¢ analyzer, which provides a waveform view of the trace data collected above.
For detailed resolution to specific techniques encountered running event trace in hardware, see Troubleshooting Event Trace in Hardware. The feature is limited by the event trace counters, streams, DDR memory and design resources available for event trace in the device.
Profiling Intra-Kernel Performance
You can also profile code blocks inside a specific kernel using aie::tile::cycles()
API.
To get this value in hardware, you can write this value to memory or to an output stream. An example of writing to output stream is shown below. This stream of data can then be examined in the host application to read back the profile data.
// get the current tile
aie::tile tile=aie::tile::current();
unsigned long long time=tile.cycles(); //cycle counter of [SS1] the AI Engine tile
writeincr(out,time);
{//loop to be profiled
}
time=tile.cycles();//cycle counter of the AI Engine tile
writeincr(out,time);
This is a very intrusive method of profiling kernel code. Xilinx recommends that you use this method to simulate the graph with the AI Engine Simulator. In addition, trace and profile data in simulation can also be used for this purpose.
For details on the aie::tile::cycles()
API,
see
AI Engine Kernel Coding Best Practices
Guide
(UG1079) .
Vitis IDE Debugger
You can also use the Vitis IDE debugger to debug kernel source code. Details on the Vitis Debugger can be found in Debugging the AI Engine Application.
Next Stage: After you determine the cause of throughput drop and fix the issue, proceed to stage 1 to rerun the design.