Graph throughput can be defined as the average number of bytes produced (or
consumed) per second. The event::io_stream_start_to_bytes_transferred_cycles
enumeration can be
used to record the number of cycles taken to transfer a certain amount of data.
After event::start_profiling()
, two
performance counters performance counter 0
and
performance counter 1
work together. performance counter 0
starts incrementing a counter
after it receives the first data. performance counter
1
increments after it receives data. When performance counter 1
equals the amount of data specified in event::start_profiling
, it generates an event that
notifies performance counter 0
to stop. The value
read back by event::read_profiling()
is the performance counter 0
value. After performance counter 0
stops, the value of the counter
represents the number of cycles taken to transfer the data.
event::start_profiling
has not been transferred, performance counter 0
does not stop. After the
specified amount of data has been transferred, the performance counter
0
will stop. This technique is useful when you want to profile the time
taken to transfer a known amount of data. However, if additional data is
transferred, the performance counter 0 continues counting.Profile Graph Throughput Using the Graph Output
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
//Third parameter is the amount of data to be transferred (in bytes).
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_start_to_bytes_transferred_cycles, WINDOW_SIZE_in_bytes*iterations);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
s2mm_run.wait();//performance counter 0 stops, assumming s2mm able to receive all data
long long cycle_count = event::read_profiling(handle);
double throughput = (double)WINDOW_SIZE_in_bytes*iterations / (cycle_count * 1e-9); //bytes per second
event::stop_profiling(handle);//Performance counter is released and cleared
Note that in above code, the run waits for s2mm
to complete to
ensure that all data are transferred through PLIO.
When using the API in the AI Engine simulation flow, you can use graph.wait()
instead. Note that after graph.wait()
, the API still requires additional cycles
to transfer data from the window buffer to PLIO. One solution is use large enough
number of iterations, so that the overhead is small enough to be negligible. Another
solution is to use graph.wait(<NUM_CYCLES>)
for a number of cycles that is long enough to make sure all data is transferred
through PLIO.
Profile Graph Throughput Using the Graph Input
mm2s
is asserted, the input net can start receiving
data even before graph::run
. One way to profile a
PLIO input is to assert the PL after event::start_profiling()
. The following example shows how to profile
graph throughput using graph
input:const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
//Third parameter is the amount of data to be transferred (in bytes).
event::handle handle = event::start_profiling(gr_pl.in, event::io_stream_start_to_bytes_transferred_cycles, WINDOW_SIZE_in_bytes*iterations);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);//After start profiling, send data from mm2s
gr_pl.wait();//performance counter 0 stops, assumming s2mm able to receive all data
long long cycle_count = event::read_profiling(handle);
double throughput = (double)WINDOW_SIZE_in_bytes*iterations / (cycle_count * 1e-9); //bytes per second
event::stop_profiling(handle);//Performance counter is released and cleared
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
//Third parameter is the amount of data to be transferred (in bytes).
event::handle handle = event::start_profiling(gr_pl.in, event::io_stream_start_to_bytes_transferred_cycles, WINDOW_SIZE_in_bytes*iterations);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
gr_pl.wait();//performance counter 0 does not stop
//Read performance counter value immediately
//Assuming that overhead can be negligible if iteration is large enough
long long cycle_count = event::read_profiling(handle);
double throughput = (double)WINDOW_SIZE_in_bytes*iterations / (cycle_count * 1e-9); //bytes per second
event::stop_profiling(handle);//Performance counter is released and cleared