The event::io_total_stream_running_to_idle_cycles
profiling event can be
used to track the running and stall events that occurs on the profiled AI Engine -
PL interface. This means that it will track the number of cycles that the interface
is active (data flowing through) and the number of cycles the interface is stalled.
It does not track the number of cycles that the interface is in idle state.
After event::start_profiling()
, the
performance counter will wait for the running data to start, and it will be paused
if the stream is idle. After the performance counter is paused, it will resume when
data flow is resumed. After event::stop_profiling()
, the performance counter will be cleared and
released. This API reports how well the AI Engine and PL kernels utilize the
available bandwidth. It is not a measure of best-case utilization of the port
bandwidth.
Profile Graph Bandwidth Using the Input Port
The bandwidth of the graph can be defined as a percentage of the time that the graph can accept data.
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
event::handle handle = event::start_profiling(gr_pl.in, event::io_total_stream_running_to_idle_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double) (WINDOW_SIZE_in_bytes*iterations/4) / cycle_count;
event::stop_profiling(handle);//Performance counter is released and cleared
where, the total running cycles can be calculated from how many bytes
are transferred. With four bytes a cycle, the total running cycles are WINDOW_SIZE_in_bytes*iterations/4
. The total running
and stalled cycles are read from the performance counter by event::read_profiling()
.
If the profiled bandwidth is 1, it means that the graph is running
faster than the PL kernel mm2s
, the input port has
not been stalled.
If the profiled bandwidth is less than 1, it means that PL kernel
mm2s
can send data faster than the graph or PL
kernel s2mm
can receive. You might need to evaluate
if the bandwidth drop is caused by the graph or PL kernel s2mm
.
Profile Graph Bandwidth Using the Output Port
s2mm
. If the
profiled bandwidth is less than 1, it means that the graph is blocked in some
percentage by s2mm
due to back pressure. An example
code to profile graph bandwidth via graph output port is as
follows:const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_total_stream_running_to_idle_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double) (WINDOW_SIZE_in_bytes*iterations/4) / cycle_count;
event::stop_profiling(handle);//Performance counter is released and cleared