AI Engine Deadlock Analysis - 2024.1 English

Vitis Tutorials: AI Engine

Document ID
XD100
Release Date
2024-06-19
Version
2024.1 English

Version: Vitis 2024.1

This tutorial introduces you to some common deadlock scenarios and shows you how to detect deadlocks (design hangs) in different tool flows. The methods introduced to detect and analyze deadlock issues include:

  1. Using event in AMD Vitis™ Analyzer to analyze design hangs.

  2. Using waveforms in hardware emulation to check AI Engine input and output activities.

  3. Using event APIs to analyze data activities for AI Engine input and output in hardware flows.

  4. Using xbutil to report AI Engine and AI Engine shim status.

  5. Using the devmem Linux command to probe AI Engine registers.

Note: The default working directory in this step is testcase_nofifo_hang, unless explicitly stated otherwise.

Common Deadlock Scenarios

A deadlock is usually caused by insufficient FIFOs, or the access rate not matching between multiple FIFOs (or different branches of the same net in stream multicast). The following figure shows some deadlock scenarios:

Deadlock Scenarios

Scenario 1: This scenario occurs when K1 tries to write to FIFO0, but the FIFO is full. K2 is still waiting for data coming from FIFO1 before consuming data from FIFO0.

Scenario 2: This scenario occurs when NET0 multicasts to multiple destinations, and the destinations are connected by stream or cascade stream (FIFO1). The NET0 branch 1 is full because K2 is waiting for data from FIFO1, but K1 is still hungry for data from NET0 branch 0 to produce data for FIFO1.

Scenario 3: This scenario occurs when K1 and K2 are connected by buffers (including RTP buffers) and streams. When K1 is trying to write data to K2 using FIFO0, K2 is still trying to acquire lock for the ping or pong buffer. K1 will not release the lock of the buffer until it finishes its current iteration.

AI Engine Deadlock Example and Analysis in AI Engine Simulator

The example is similar to the one used in AI Engine Execution and Measurement, except that it does not have a FIFO for the stream connection:

Deadlock Example Graph

When the design stalls, graph::wait() and graph::end() hang. It needs to interrupt graph execution by:

  • Using graph::wait(CYCLE_NUMBER): Specifying the number of cycles to wait for the API to return (if the graph does not return after CYCLE_NUMBER cycles, this API still returns immediately).

  • Using graph::end(CYCLE_NUMBER): Specifying the number of cycles to wait for the graph to be ended (if the graph does not return after CYCLE_NUMBER cycles, this API still ends the graph immediately).

  • Using the --simulation-cycle-timeout CYCLE_NUMBER option for aiesimulator.

The CYCLE_NUMBER should be large enough for AI Engine simulator to record all the stall events, or for hardware to run into hang status.

  1. In this example, examine aie/graph.cpp. We wait for 10000 cycles:

    gr.init();
    gr.run(4);
    gr.wait(10000);
    
  2. Run AI Engine simulator using the following command:

    make aiesim
    
  3. Open Trace view in Vitis Analyzer by using the following command:

    vitis_analyzer aiesimulator_output/default.aierun_summary
    

    Trace View

    The hang occurs after the following activities:

    1: Kernel aie_dest1 acquires the lock of read buffer (buf0) and write buffer (buf1).

    2: Kernel aie_dest1 starts.

    3: Kernel hangs in stream stall.

    4: S2mm is waiting for kernel aie_dest1 to release buffer buf0.

AI Engine Stall Analysis with Vitis Analyzer

Vitis Analyzer can use the event trace from the AI Engine simulation to do stall analysis that shows an overview of the stall status in metrics. It also helps you determine where the stall has happened, and the possible causes.

If you are using Vitis Analyzer to do stall analysis, run the AI Engine simulator with --online -wdb -ctf options to generate event trace information in the background:

aiesimulator --pkg-dir=./Work --online -wdb -ctf

Note: For more information about AI Engine stall analysis using Vitis Analyzer in the hardware emulation flow, refer to the Versal ACAP AI Engine Programming Environment User Guide (UG1076).

In Vitis Analyzer, the Performance Metrics view gives an overview of the stalls in the design:

Performance Metrics View

Each tile shows percentages for each type of stall. From the metrics table, it can be seen that tile (24,0) has a large percentage of lock stall (98.896%), and tile (25,0) has a large percentage of stream stall (98.380%). These metrics indicate that the design is hanging, and that analysis is required.

In the Graph view of Vitis Analzyer, you can visualize the stalled path in the graph, which gives an indication of where the stall has happened in the design. By understanding the design behavior, it is also possible to estimate the cause of the hang.

For example, select the stream stall in Trace view, and switch to Graph view. In this design, kernel k[0] hangs in stream stall. The full destination port is gr.k[1]/in, which means that the destination kernel k[1] is not receiving data from the stream.