Xilinx provides event::io_stream_running_event_count
enumeration to count the running
event, which corresponds to the number of samples sent through the net. When the event::start_profiling()
is run, the performance counter
starts. The performance counter increments each time the data sample passes through the
AI Engine - PL interface. The value read back by event::read_profiling()
is the number of samples that have been sent
though that interface.
Method Used to Count Samples Sent and Received
This method can be used to count the number of samples sent or received prior
to the graph
stall
. The following example can be used to count the
number of samples sent to the AI Engine
port:event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
sleep(2);//Wait for enough time
long long cycle_count = event::read_profiling(handle);
printf("Sample number: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
The following example can be used to count the number of samples received from
the AI Engine
port:
event::handle handle = event::start_profiling(gr_pl.in, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);//After start profiling, send data from mm2s
gr_pl.run(iterations);
sleep(2);//Wait for enough time
long long cycle_count = event::read_profiling(handle);
printf("Sample number: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
Profile Port Throughput
Port throughput is defined as the number of samples transferred in a
specific time period. After the graph runs, the following code can be inserted in
the host code to measure the port throughput. In order to profile port throughput
for your design in steady state, you must ensure the data transfer is in a stable
state prior to profiling the port throughput.
int wait_time_us=20000;
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
long long count0 = event::read_profiling(handle);
usleep(wait_time_us);
long long count1 = event::read_profiling(handle);
event::stop_profiling(handle);
long long samples = count1 - count0;
std::cout << "num runnning samples: " << samples << std::endl;
std::cout << "Throughput: " << (double)samples / wait_time_us << " MSPS " << std::endl;
Xilinx recommends that you run the design for many iterations in hardware to ensure accuracy. The accuracy of this method can vary in hardware emulation.
For the AI Engine simulator, this
profiling method applies too. You need to replace
usleep
with the wait
function in
SystemC, and the wait time needs to be much smaller, because it is much slower in
simulation. For example, the sleep
function in the
preceding code can be replaced with following function call for the AI Engine
simulator.wait(20,SC_US);