Graph throughput is the average number of bytes produced (or consumed) per
second. You can use the event::io_stream_start_to_bytes_transferred_cycles enumeration to
record the number of cycles taken to transfer a certain amount of data.
After event::start_profiling(), two
performance counters performance counter 0 and
performance counter 1 work together. performance counter 0 starts incrementing a counter
after it receives the first data. performance counter
1 increments after it receives data.
When performance counter 1 equals
the amount of data specified in event::start_profiling, it generates an event that notifies performance counter 0 to stop. The value read back by
event::read_profiling() is the performance counter 0 value. After performance counter 0 stops, the value of the counter
represents the number of cycles taken to transfer the data.
event::start_profiling is not transferred, performance counter 0 does not stop. The performance counter 0 stops after the specified amount of data is
transferred. This technique is useful when you want to profile the time taken to
transfer a known amount of data. However, performance counter 0
continues to count if additional data is transferred.Profile Graph Throughput Using the Graph Output
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
//Third parameter is the amount of data to be transferred (in bytes).
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_start_to_bytes_transferred_cycles, WINDOW_SIZE_in_bytes*iterations);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
s2mm_run.wait();//performance counter 0 stops, assumming s2mm able to receive all data
long long cycle_count = event::read_profiling(handle);
double throughput = (double)WINDOW_SIZE_in_bytes*iterations / (cycle_count * 1e-9); //bytes per second
event::stop_profiling(handle);//Performance counter is released and cleared
s2mm to complete. Waiting ensures that all data are
transferred through PLIO. When using the API in the AI Engine simulation flow, you can use graph.wait() instead. Note that after graph.wait(), the API still requires additional cycles
to transfer data from the window buffer to PLIO. One solution is use large enough
number of iterations, so that the overhead is small enough to be negligible. Another
solution is to use graph.wait(<NUM_CYCLES>)
for a number of cycles that is long enough to make sure all data are transferred
through PLIO.
Profile Graph Throughput Using the Graph Input
mm2s is asserted, the input net can start receiving data even before
graph::run. One way to profile a PLIO input is
to assert the PL after event::start_profiling().
The following example shows how to profile graph throughput using graph
input:const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
//Third parameter is the amount of data to be transferred (in bytes).
event::handle handle = event::start_profiling(gr_pl.in, event::io_stream_start_to_bytes_transferred_cycles, WINDOW_SIZE_in_bytes*iterations);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);//After start profiling, send data from mm2s
gr_pl.wait();//performance counter 0 stops, assumming s2mm able to receive all data
long long cycle_count = event::read_profiling(handle);
double throughput = (double)WINDOW_SIZE_in_bytes*iterations / (cycle_count * 1e-9); //bytes per second
event::stop_profiling(handle);//Performance counter is released and cleared
You can estimate graph throughput with this method even if the amount of data transferred is not known. For example, you can still profile the graph throughput via graph input if the following conditions are met:
- PL kernels are free-running, and
- The graph output AI Engine - PL interface column is running out of performance counters
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
//Third parameter is the amount of data to be transferred (in bytes).
event::handle handle = event::start_profiling(gr_pl.in, event::io_stream_start_to_bytes_transferred_cycles, WINDOW_SIZE_in_bytes*iterations);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
gr_pl.wait();//performance counter 0 does not stop
//Read performance counter value immediately
//Assuming that overhead can be negligible if iteration is large enough
long long cycle_count = event::read_profiling(handle);
double throughput = (double)WINDOW_SIZE_in_bytes*iterations / (cycle_count * 1e-9); //bytes per second
event::stop_profiling(handle);//Performance counter is released and cleared