You can obtain profiling data when you run your design in simulation or in hardware at run time. Analyzing this data helps you gauge the efficiency of the kernels, the stall and active times associated with each AI Engine, and pinpoint the AI Engine kernel whose performance might not be optimal. This also allows you to collect data on design latency, throughput, and bandwidth.
You have two options to gather this information:
- Use run-time event APIs in your PS host code
- Use performance counters built into the hardware using a compile time option