You can use the Xilinx Runtime (XRT) APIs to measure performance metrics like platform I/O port bandwidth, graph throughput, and graph latency. Use these APIs in the host application code with the AI Engine graph object. This object is used to initialize, run, update and exit graphs. In addition, you can use these APIs to profile graph objects to measure bandwidth, throughput, and latency. For more information, see Run-Time Event API for Performance Profiling in the AI Engine User Guide (UG1076).
AI Engine performance analysis typically involves system performance issues such as missing or mismatching locks, buffer overruns, and incorrect programming of direct memory access (DMA) buffers. It also includes memory/core stalls, deadlocks, and hot spot analysis. The AI Engine architecture has direct support for generation, collection, and streaming of events as trace data during simulation, hardware emulation, or hardware execution. This data can then be analyzed for functional issues and latency problems between kernels, memory stalls, deadlocks, etc. For more information, see the following:
- AI Engine Performance and Deadlock Analysis Tutorial available from the Xilinx GitHub repository
- Performance Analysis of AI Engine Graph Application during Simulation in the AI Engine User Guide (UG1076)
- Performance Analysis of AI Engine Graph Application on Hardware in the AI Engine User Guide (UG1076)