Profiling Running Event and Graph Throughput - 2022.1 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2022-05-25
Version
2022.1 English

Xilinx provides event::io_stream_running_event_count enumeration to count the running event, which corresponds to the number of samples sent through the net. When the event::start_profiling() is run, the performance counter starts. The performance counter increments each time the data sample passes through the AI Engine - PL interface. The value read back by event::read_profiling() is the number of samples that have been sent though that interface.

Method Used to Count Samples Sent and Received

This method can be used to count the number of samples sent or received prior to the graph stall. The following example can be used to count the number of samples sent to the AI Engine port:
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
gr_pl.run(iterations);
sleep(2);//Wait for enough time
long long cycle_count = event::read_profiling(handle);
printf("Sample number: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
The following example can be used to count the number of samples received from the AI Engine port:
event::handle handle = event::start_profiling(gr_pl.in, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);//After start profiling, send data from mm2s
gr_pl.run(iterations);
sleep(2);//Wait for enough time
long long cycle_count = event::read_profiling(handle);
printf("Sample number: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared

Profile Port Throughput

Port throughput is defined as the number of samples transferred in a specific time period. After the graph runs, the following code can be inserted in the host code to measure the port throughput. In order to profile port throughput for your design in steady state, you must ensure the data transfer is in a stable state prior to profiling the port throughput.
int wait_time_us=20000;
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
long long count0 = event::read_profiling(handle); 
usleep(wait_time_us); 
long long count1 = event::read_profiling(handle); 
event::stop_profiling(handle); 
long long samples = count1 - count0; 
std::cout << "num runnning samples: " << samples << std::endl; 
std::cout << "Throughput: " << (double)samples / wait_time_us << " MSPS " << std::endl;

Xilinx recommends that you run the design for many iterations in hardware to ensure accuracy. The accuracy of this method can vary in hardware emulation.

For the AI Engine simulator, this profiling method applies too. You need to replace usleep with the wait function in SystemC, and the wait time needs to be much smaller, because it is much slower in simulation. For example, the sleep function in the preceding code can be replaced with following function call for the AI Engine simulator.
wait(20,SC_US);