The event::io_stream_start_difference_cycles
enumeration can be used to measure the latency between two PLIO or GMIO ports. After
event::start_profiling() API, two performance
counters starts incrementing each cycle, waiting two independent nets to receive
their first data. After the first data passes either net, the corresponding
performance counter stops. The value read back by event::read_profiling() is the number difference between the two
performance counters.
After event::stop_profiling(), the
performance counter is cleared and released.
Profile Graph Latency
Graph latency can be defined as the time spent from receiving the first input data to producing the first output data. Graph latency is not dependent on the number of iterations the graph is run for. The following examples shows how to profile graph latency using the event API in AI Engine simulation flow and hardware/hardware emulation flows.
event::start_profiling() has two different PLIO parameters.event::handle handle = event::start_profiling(gr_pl.in, gr_pl.dataout, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations); //Data transfer starts after graph.run()
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);
event::handle handle = event::start_profiling(gr_pl.in, gr_pl.dataout, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S); //input data transfer starts
s2mm_run.wait();//make sure both ports have data transferred
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
mm2s starts. To avoid any
overhead that graph.run() can introduce in the
profiling graph latency, in the profiling code, start PL kernel mm2s after event::start_profiling, and after graph.run().Profile Latency Difference Between Two Ports
This method is not limited to the profile latency between the input and output ports of the same graph. It can be used to profile latency between any two ports. For example, it can profile latency between two output ports that have a common input port.
AI Engine Simulation
event::handle handle = event::start_profiling(gr_pl.dataout, gr_pl.dataout2, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
Hardware Emulation and Hardware
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);event::handle handle = event::start_profiling(gr_pl.dataout, gr_pl.dataout2, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
s2mm_run.wait();//make sure both ports have data transferred
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
In the example code, a positive number indicates that the data arrives gr_pl.dataout2 later than gr_pl.dataout. A negative number indicates that data arrives gr_pl.dataout2 earlier than gr_pl.dataout.