The event::io_total_stream_running_to_idle_cycles profiling event can be
used to track the running and stall events that occur on the profiled AI Engine - PL interface. This means that
it tracks the number of cycles that the interface is active (data flowing through)
and the number of cycles the interface is stalled. It does not track the number of
cycles that the interface is in idle state.
After event::start_profiling(), the
performance counter waits for the running data to start, and it pauses if the stream
is idle. After the performance counter pauses, it resumes when data flow resumes.
After event::stop_profiling(), the performance
counter clears and releases. This API reports how well the AI Engine and PL kernels use the available bandwidth. It is
not a measure of best-case utilization of the port bandwidth.
Profile Graph Bandwidth Using the Input Port
The bandwidth of the graph can be defined as a percentage of the time that the graph can accept data.
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
event::handle handle = event::start_profiling(gr_pl.in, event::io_total_stream_running_to_idle_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double) (WINDOW_SIZE_in_bytes*iterations/4) / cycle_count;
event::stop_profiling(handle);//Performance counter is released and cleared
where, the total running cycles can be calculated from how many bytes
are transferred. With four bytes a cycle, the total running cycles are WINDOW_SIZE_in_bytes*iterations/4. The total running
and stalled cycles are read from the performance counter by event::read_profiling().
If the profiled bandwidth is 1, it means that the graph is running
faster than the PL kernel mm2s, the input port has
not been stalled.
If the profiled bandwidth is less than 1, it means that PL kernel
mm2s can send data faster than the graph or PL
kernel s2mm can receive. You might need to evaluate
if the bandwidth drop is caused by the graph or PL kernel s2mm.
Profile Graph Bandwidth Using the Output Port
The bandwidth of the graph can be defined as the percentage of the time that
the graph can send data. If the profiled bandwidth is 1, it means that the graph is
not blocked by PL kernel s2mm. If the profiled
bandwidth is less than 1, it means that the graph is blocked in some percentage by
s2mm due to back pressure. An example code to
profile graph bandwidth via graph output port is as follows:
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_total_stream_running_to_idle_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double) (WINDOW_SIZE_in_bytes*iterations/4) / cycle_count;
event::stop_profiling(handle);//Performance counter is released and cleared