The AI Engine has hardware performance counters and can be configured to count hardware events for measuring performance metrics. You can use the event API together with the graph control API to profile certain performance metrics during a controlled period of graph execution. The event API supports only platform I/O ports (PLIO & GMIO) to measure performance metrics such as platform I/O port bandwidth, graph throughput, and graph latency.
The event API tracks the events that occurred on the stream switch of the
nets that cross the AI Engine - PL interfaces. The events on the stream switch of the
nets include idle
, running
and stall
, as shown in following
figure.
- When there is no data passing the stream switch, the stream switch
is in an
idle
state. - When there is data passing the stream switch, the stream switch is
in a
running
state. - When all the FIFOs on the net are full, the stream switch is in a
stall
state. - When the transfer of data resumes, the stream switch returns to a
running
state.
The following graph shows an example of data being sent from the mm2s
PL kernel to the AI Engine. It also shows the graph sending data
from the AI Engine to the
s2mm
PL kernel.
Different ports can go through the same AI Engine - PL interface column, which shares the performance counters in the interface. You may check the array view in Vitis IDE to see which columns are those ports routing through. The following picture shows an array view of the above example, and note that the stream switches in red circle are where the event API is monitoring.
- The input buffer
buf0
in the AI Engine is ready to accept data from themm2s
PL kernel after the graph has been initialized. As soon as themm2s
PL kernel starts, it will sequentially fill up the PING-PONG buffers and FIFOs inside the stream switch connected to thebuf0
. The data transported to and from these buffers do not depend ongraph.run()
. - Each column of the AI Engine - PL interface has two performance counters.
Because there are limited performance counters,
event::stop_profiling()
can be used to release the performance counters. - There is some overhead when calling the graph and profiling
APIs. The profiling results can be read using
event::read_profiling()
. The profiling results can vary if the performance counters are not stopped beforeevent::read_profiling()
.
adf::registerXRT
is required before using the event profile ADF APIs. For
example:#include "adf/adf_api/XRTConfig.h"
......
auto device = xrt::device(0); //device index=0
auto uuid = device.load_xclbin(xclbinFilename);
auto dhdl = xrtDeviceOpenFromXcl(device);
adf::registerXRT(dhdl, uuid.get());
event::handle handle = event::start_profiling(......);