If a deadlock does not show in the AI Engine simulator or hardware emulation flows, it might still show in the hardware flow.
The following is the PS code to profile how much data is transferred for the input and output:
xrt::aie::profiling handle(device), handle2(device);
handle.start(xrt::aie::profiling::profiling_option::io_stream_running_event_count, "gr.dataout", "", 0);
handle2.start(xrt::aie::profiling::profiling_option::io_stream_running_event_count, "gr.in", "", 0);
//kernel run
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);//1st run for s2mm has started
auto mm2s_run = mm2s(in_bo, nullptr, OUTPUT_SIZE);
auto ghdl=xrt::graph(device,uuid,"gr");
ghdl.run(4);
// Wait graph for some cycles
ghdl.end(5); // wait for AIE kernel to complete or 5 milliseconds
long long data_out_count = handle.read();
long long data_in_count = handle2.read();
handle.stop();
handle2.stop();
std::cout<<"Output data received:"<<data_out_count<<std::endl;
std::cout<<"Input data sent:"<<data_in_count<<std::endl;
Note: The system must start mm2s after handle.start(). Otherwise, the data transfer begins after mm2s starts, and that happens before handle.start() and gr.run(4).
The output is similar to:
Output data received:0
Input data sent:104
You can estimate the status of the design based on how much data has been transferred for the input and output. You can replace the graph.wait(50000) in the above code with sleep or usleep APIs to wait a certain amount of time depending on the scale of the design.
If necessary, you can insert an Integrated Logic Analyzer (ILA) to probe the interfaces of the PL kernels to detect the AI Engine and PL kernels’ running status.
Refer to the AI Engine Status Analysis for instructions on how to use the Vitis Analyzer to understand the AI Engine status in both hardware and hardware emulation.