The event::io_total_stream_running_to_idle_cycles
enumeration can be used
to accumulate the running and stall events happened on the profiled AI Engine - PL
interface, which means that it will counter how many cycles that has data passing
and how many cycles the interface is stalled. But it will ignore the idle state.
After event::start_profiling()
, the
performance counter will wait for running data to start, and it will be paused if
the stream is idle. After the performance is paused, it will resume if there's new
data coming. After event::stop_profiling()
, the
performance counter will be cleared and released.
Profile Graph Bandwidth Using the Input Port
The bandwidth of the graph can be defined as a percentage of the time that the graph can accept data.
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
event::handle handle = event::start_profiling(gr_pl.in, event::io_total_stream_running_to_idle_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double) (WINDOW_SIZE_in_bytes*iterations/4) / cycle_count;
event::stop_profiling(handle);//Performance counter is released and cleared
where, the total running cycles can be calculated from how many bytes
are transferred. With four bytes a cycle, the total running cycles are WINDOW_SIZE_in_bytes*iterations/4
. The total running
and stalled cycles are read from the performance counter by event::read_profiling()
.
If the profiled bandwidth is 1, it means that the graph is running
faster than the PL kernelmm2s
, the input port has
not been stalled.
If the profiled bandwidth is less than 1, it means that PL kernel
mm2s
can send data faster than the graph or PL
kernel s2mm
can receive. You might need to evaluate
if the bandwidth drop is caused by the graph or PL kernel s2mm
.
Profile Graph Bandwidth Using the Output Port
s2mm
. If the
profiled bandwidth is less than 1, it means that the graph is blocked in some
percentage by s2mm
due to back pressure. An example
code to profile graph bandwidth via graph output port is as
follows:const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_total_stream_running_to_idle_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double) (WINDOW_SIZE_in_bytes*iterations/4) / cycle_count;
event::stop_profiling(handle);//Performance counter is released and cleared