AI Engine Deadlock Detection in the Hardware Flow - 2025.2 English - XD100

Vitis Tutorials: AI Engine Development (XD100)

Document ID
XD100
Release Date
2025-12-05
Version
2025.2 English

If a deadlock does not show in the AI Engine simulator or hardware emulation flows, it might still show in the hardware flow.

The PS code to profile how much data has been transferred for the input and output is shown below:

```
xrt::aie::profiling handle(device), handle2(device);
handle.start(xrt::aie::profiling::profiling_option::io_stream_running_event_count, "gr.dataout", "", 0);
handle2.start(xrt::aie::profiling::profiling_option::io_stream_running_event_count, "gr.in", "", 0);

//kernel run
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);//1st run for s2mm has started
auto mm2s_run = mm2s(in_bo, nullptr, OUTPUT_SIZE);
auto ghdl=xrt::graph(device,uuid,"gr");
ghdl.run(4);
// Wait graph for some cycles
ghdl.end(5); // wait for AIE kernel to complete or 5 milliseconds

long long data_out_count = handle.read();
long long data_in_count = handle2.read();
handle.stop();
handle2.stop();
std::cout<<"Output data received:"<<data_out_count<<std::endl;
std::cout<<"Input data sent:"<<data_in_count<<std::endl;
```

Note: mm2s needs to be started after handle.start(). Otherwise, the data transfer begins after mm2s starts, and that happens before handle.start() and gr.run(4).

The output is similar as:

```
Output data received:0
Input data sent:104
```

From how much data has been transferred for the input and output, the status of the design can be estimated. The graph.wait(50000) in the above code can be replaced with sleep or usleep APIs to wait a certain amount of time depending on the scale of the design.

If necessary, an Integrated Logic Analyzer (ILA) can be inserted to probe the interfaces of the PL kernels to detect the AI Engine and PL kernels’ running status.

Refer to AI Engine Status Analysis for how to use Vitis Analyzer to understand the AI Engine status in both hardware and hardware emulation.