The event::io_stream_start_difference_cycles
enumeration can be used to
measure the latency between two PLIO or GMIO ports. After event::start_profiling()
API, two performance counters starts
incrementing each cycle, waiting two independent nets to receive their first data.
After the first data passes either net, the corresponding performance counter will
stop. The value read back by event::read_profiling()
is the number difference between the two
performance counters.
After event::stop_profiling()
, the
performance counter is cleared and released.
Profile Graph Latency
Graph latency can be defined as the time spent from receiving the first input data to producing the first output data. It is not dependent on the number of iterations the graph is run for. The following examples shows how to profile graph latency using the event API in AI Engine simulation flow and hardware/hardware emulation flows.
event::start_profiling()
has two different PLIO parameters.event::handle handle = event::start_profiling(gr_pl.in, gr_pl.dataout, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations); //Data transfer starts after graph.run()
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);
event::handle handle = event::start_profiling(gr_pl.in, gr_pl.dataout, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S); //input data transfer starts
s2mm_run.wait();//make sure both ports have data transferred
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
mm2s
starts. To avoid any
overhead that graph.run()
may introduce in the
profiling graph latency, in the profiling code, start PL kernel mm2s
after event::start_profiling
, and after graph.run()
.Profile Latency Difference Between Two Ports
This method is not limited to profile latency between input port and output port of the same graph. It can be used to profile latency between any two ports. For example, it can profile latency between two output ports that have a common input port.
AI Engine Simulation
event::handle handle = event::start_profiling(gr_pl.dataout, gr_pl.dataout2, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
Hardware Emulation and Hardware
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);event::handle handle = event::start_profiling(gr_pl.dataout, gr_pl.dataout2, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
s2mm_run.wait();//make sure both ports have data transferred
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
where, a positive number indicates that the data arrives gr_pl.dataout2
later than gr_pl.dataout
, while a negative number indicates that data arrives
gr_pl.dataout2
earlier than gr_pl.dataout
.