After the graph is compiled using the Vitis tools or aiecompiler, each AI Engine array interface (or shim tile) can be monitored to count for specific events. You can use a few profiling events to count valid AXI4-Stream data transactions within the AI Engine array interface. When the APIs are called, the PS issues a sequence of AXI4-MM commands to configure the AI Engine array interface to count for valid events. The event counters in the AI Engine array interface provide a helpful way to measure the system without adding any additional hardware to the system.
The following example uses the
io_stream_start_to_bytes_transferred_cycles
event API to measure
the throughput of the graph. This API uses two performance counters to track both the
bytes transferred and cycles taken. This event API captures and calculates the sum of
the total active, stall, and idle cycles that transfer the specified amount of data
through the graph. This API can be used on both input and output streams.
gr.init();
event::handle handle = event::start_profiling(plio_out,
event::io_stream_start_to_bytes_transferred_cycles, 256*sizeof(int32));
gr.run(8);
gr.wait();
long long cycle_count = event::read_profiling(handle);
event::stop_profiling(handle);
double throughput = (double)256 * sizeof(int32) / (cycle_count * 1e-9); //
byte per second
You can use an alternative event API when the number of bytes being
transferred is unknown. The following example uses the
io_stream_running_event_count
event API to measure the throughput
of the graph. The streams run for a specific interval of time, and the number of stream
active events is captured.
...
...
using namespace adf;
event::handle handle_0;
PLIO duc_plio[2] = {*duc_in0, *duc_out0};
d=0;
while(d < NUM_DUC_SLAVES) {
long long throughput_out_min = 990000000; // initial value to some high number
long long throughput_out_max = 0;
int iter=0;
while(iter < 5) {
long long count_start, count_end;
long long throughput;
handle_0 = event::start_profiling(duc_plio[d], event::io_stream_running_event_count);
count_start = event::read_profiling(handle_0);
//precision of usleep is dependent on linux system call
usleep(1000000); //1s
count_end = event::read_profiling(handle_0);
event::stop_profiling(handle_0);
if (count_end > count_start) throughput = (count_end-count_start);
else throughput = (count_end-count_start+0x100000000); //roll over correction for 32b performance counter
if (throughput<throughput_out_min) throughput_out_min = throughput;
if (throughput>throughput_out_max) throughput_out_max = throughput;
iter++;
}
printf("[throughput] %d\tMin:%llu\tMax:%llu\tRange:%llu\n", d, throughput_out_min, throughput_out_max, throughput_out_max-throughput_out_min );
d++;
}
printf("[main] Performance measurements Done ... \n");
...
...
For information, see this link in the AI Engine Tools and Flows User Guide (UG1076).