The bandwidth of a platform I/O port can be defined as the average number of
bytes transferred per second, which can be derived as the total number of bytes
transferred divided by the time when the port is transferring or is stalled (for
example, due to back pressure). The following example shows how to profile I/O port
bandwidth using the event API. In the example, gr
is
the application graph object, plio_out
is the PLIO
object connecting to the graph output port, and the graph is designed to produce 256
int32 data samples in eight iterations.
gr.init();
event::handle handle = event::start_profiling(plio_out, event::io_total_stream_running_to_idle_cycles);
gr.run(8);
gr.wait();
long long cycle_count = event::read_profiling(handle);
event::stop_profiling(handle);
double bandwidth = (double)256 * sizeof(int32) / (cycle_count * 1e-9); //byte per second
In the example, after the graph is initialized, the event::start_profiling
is called to configure the AI Engine to count the accumulated clock
cycles between the stream running event and the stream idle event. In other words,
it counts the number of cycles when the stream port is in running or in stall state.
The first argument in event::start_profiling
can be
a PLIO or a GMIO object, in this case, it is plio_out
. The second argument is event::io_profiling_option
enumeration, and in this case, the
enumeration is set to event::io_total_stream_running_to_idle_cycles
. event::start_profiling
returns a handle, which will be used later to
read the counter value and to stop the profile. After the graph finishes eight
iterations, you can call event::read_profiling
to
read the counter value by supplying the handle. After profiling is done, it is
recommended to stop the performance counter by calling event::stop_profiling
with the handle so the hardware resources
configured to do the profile can be released for other uses. Finally, the bandwidth
is derived by dividing the total number of bytes transferred (256 × sizeof(int32))
by the time spent when the stream port is active (cycle_count × 1e-9, assuming the AI Engine is running at 1 GHz).