Profiling - 2024.1 English

AI Engine System Software Driver Reference Manual (UG1642)

Document ID
UG1642
Release Date
2024-05-30
Version
2024.1 English

AI Engine has performance counters that can be used for profiling. AI Engine core has four performance counters and AI Engine memory has two performance counters. The AI Engine driver provides APIs for performance counter configuration:

  • Configure the performance counters with start event and stop event
  • Read the performance counter values
  • Reset the performance counters

The AI Engine system software driver provides a collection of profiling APIs which are being exposed through the XRT APIs for you to profile the AI Engine design. The following is a sample usage of profiling APIs in the host code to profile the AI Engine design to get some performance parameters.

For more information on different performance metrics and details about the host APIs, refer to Event Profile APIs for Graph Inputs and Outputs in the AI Engine Tools and Flows User Guide (UG1076).

To profile the design and calculate the port throughput, you should add the APIs into the host code. The code changes to profile the design for port throughput calculation. See the example tutorial code. An example code for profiling the graph throughput is as follows:

const int buffer_sizeIn_bytes = 512;
event::handle handle = event::start_profiling(mygraph.out0,event::io_stream_start_to_bytes_transferred_cycles,buffer_sizeIn_bytes*NIterations);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
mygraph.run(NIterations);
mygraph.end();
...
...
s2mm_1_rhdl.wait();

long long cycle_count = event::read_profiling(handle);
std::cout<<"cycle count:"<<cycle_count<<std::endl;
event::stop_profiling(handle);//Performance counter is released and cleared
double throughput = (double)buffer_sizeIn_bytes*NIterations / (cycle_count * 0.8* 1e-3); //bytes per second 
std::cout<<"Throughput of the graph: "<<throughput<<" MB/s"<<std::endl;

Here is a sample output:

 run mm2s
run s2mm
Register XRT
graph run
graph end
After MM2S wait
After S2MM_1 wait
cycle count:2965
Throughput of the graph: 1510.96 MB/s