AMD provides event::io_stream_running_event_count
enumeration to count the running
event, which corresponds to the number of samples sent through the net. When the event::start_profiling()
is run, the performance counter
starts. The performance counter increments each time the data sample passes through the
AI Engine - PL interface. The value read back by event::read_profiling()
is the number of samples that have been sent
though that interface.
Method Used to Count Samples Sent and Received
This method can be used to count the number of samples sent or received prior
to the graph stall. The following example can be used to count the number of samples
received from the AI Engine port prior to a graph
stall:
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr_pl.run(iterations);
sleep(2);//Wait for enough time
long long cycle_count = event::read_profiling(handle);
printf("Sample number: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
The following example can be used to count the number of samples sent to the
AI Engine port
prior to a graph
stall:
event::handle handle = event::start_profiling(gr_pl.in, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);//After start profiling, send data from mm2s
gr_pl.run(iterations);
sleep(2);//Wait for enough time
long long cycle_count = event::read_profiling(handle);
printf("Sample number: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared
Profile Port Throughput
Port throughput is defined as the number of samples transferred in a specific
time period. After the graph runs, the following code can be inserted in the host
code to measure the port throughput. In order to profile port throughput for your
design in steady state, you must ensure the data transfer is in a stable state prior
to profiling the port throughput.
gr_pl.run(iterations); // The graph may also have been started during device boot-up
usleep(100); // Wait enough time (here 100us) to be in a steady state IO activity
int wait_time_us=20000;
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_stream_running_event_count);
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
long long count0 = event::read_profiling(handle);
usleep(wait_time_us);
long long count1 = event::read_profiling(handle);
event::stop_profiling(handle);
long long samples = count1 - count0;
std::cout << "num runnning samples: " << samples << std::endl;
std::cout << "Throughput: " << (double)samples / wait_time_us << " MSPS " << std::endl;
AMD recommends that you run the design for many iterations in hardware to ensure accuracy. The accuracy of this method can vary in hardware emulation.
For the AI Engine simulator, this profiling
method applies too. You need to replace
usleep
with
the wait
function in SystemC, and the wait time may
be smaller depending on the simulation time, because it is running slower in
simulation. For example, the usleep
function in the
preceding code can be replaced with following function call for the AI Engine simulator, with reduced simulation
time.wait(20,SC_US);