Profiling Running Event and Graph Throughput - 2024.1 English

AI Engine Tools and Flows User Guide (UG1076)

Document ID
UG1076
Release Date
2024-06-27
Version
2024.1 English

AMD provides event::io_stream_running_event_count enumeration to count the running event, which corresponds to the number of samples sent through the net. When the event::start_profiling() is run, the performance counter starts. The performance counter increments each time the data sample passes through the AI Engine - PL interface. The value read back by event::read_profiling() is the number of samples that have been sent though that interface.

Method Used to Count Samples Sent and Received

This method can be used to count the number of samples sent or received prior to the graph stall. The following example can be used to count the number of samples received from the AI Engine port prior to a graph stall:
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
gr_pl.run(iterations);
sleep(2);//Wait for enough time
long long cycle_count = event::read_profiling(handle);
printf("Sample number: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
The following example can be used to count the number of samples sent to the AI Engine port prior to a graph stall:
event::handle handle = event::start_profiling(gr_pl.in, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);//After start profiling, send data from mm2s
gr_pl.run(iterations);

sleep(2);//Wait for enough time
long long cycle_count = event::read_profiling(handle);
printf("Sample number: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared

Profile Port Throughput

Port throughput is defined as the number of samples transferred in a specific time period. After the graph runs, the following code can be inserted in the host code to measure the port throughput. In order to profile port throughput for your design in steady state, you must ensure the data transfer is in a stable state prior to profiling the port throughput.
gr_pl.run(iterations); // The graph may also have been started during device boot-up
usleep(100); // Wait enough time (here 100us) to be in a steady state IO activity
int wait_time_us=20000;
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
long long count0 = event::read_profiling(handle); 
usleep(wait_time_us); 
long long count1 = event::read_profiling(handle); 
event::stop_profiling(handle); 
long long samples = count1 - count0; 
std::cout << "num runnning samples: " << samples << std::endl; 
std::cout << "Throughput: " << (double)samples / wait_time_us << " MSPS " << std::endl;

AMD recommends that you run the design for many iterations in hardware to ensure accuracy. The accuracy of this method can vary in hardware emulation.

For the AI Engine simulator, this profiling method applies too. You need to replace usleep with the wait function in SystemC, and the wait time may be smaller depending on the simulation time, because it is running slower in simulation. For example, the usleep function in the preceding code can be replaced with following function call for the AI Engine simulator, with reduced simulation time.
wait(20,SC_US);