Following are recommended methods of debugging AI Engine-PL performance:
- Break the AI Engine graph into smaller graphs to
analyze bottlenecks on silicon. For example:
- If the graph has kernels in the AI Engine and PL, compartmentalize the graph into sub-graphs to verify functionality and performance. With this method, you can localize performance bottlenecks.
- If compute kernels (either in the AI Engine or PL) receive data from multiple AXI4 streams, the kernels might underperform due to the variable time of data arrival on different streams. This can happen due to either backpressure or to the different compute complexities of previous kernels in the graph. The graph can be broken down at the kernel level to verify that all the streams perform optimally.
Note: Alternatively, you can analyze bottlenecks using kernel-level performance measurement and debug. - Replace the AI Engine graph with a simple pass-through system.
- Use the event trace debug feature to count memory stalls in different kernels. For more information, see this link in the Versal ACAP AI Engine Programming Environment User Guide (UG1076).