AI Engine Runtime Events Handling - 2024.1 English

AI Engine System Software Driver Reference Manual (UG1642)

Document ID
UG1642
Release Date
2024-05-30
Version
2024.1 English

The AI Engine errors events that are considered errors to generate interrupt are configured at compilation during CDO generation. At runtime, the AI Engine SSW software stack only reports errors but does not change the errors interrupt configuration. In the case of Linux flow, the AI Engine kernel driver notifies the XRT kernel driver about the errors happening by calling the XRT registered callback.

User space Error APIs

In general, these APIs are exposed through XRT where the xrt::error class (the class to retrieve the asynchronous errors in the host code) and its member functions are provided to retrieve the asynchronous errors into the user space host code. This helps to debug when something goes wrong. The API details are as follows:

error::get_error_code()
Gets the last error code and its timestamp.
error::get_timestamp()
Gets the timestamp of the last error.
error::to_string()
Gets the description string of a given error code.
Example code:
graph.run(runInteration);
     try {
        graph.wait(timeout);
     }
     catch (const std::system_error& ex) {
        if (ex.code().value() == ETIME) {
           xrt::error error(device, XRT_ERROR_CLASS_AIE);
           auto errCode = error.get_error_code();
           auto timestamp = error.get_timestamp();
           auto err_str = error.to_string();
           /* code to deal with this specific error */
           std::cout << err_str << std::endl;
        } else {
         /* Something else */
        }
     }
Figure 1. Example Output
An observation is that aie aie0 asserted tile error event 60 at col 25 row 1. The previous figure shows an error propagated from the AI Engine array and is used to debug the application-specific errors. For the list of error events, see AI Engine Error Events in the AI Engine Tools and Flows User Guide (UG1076). Notice the error event 60 above, which represents the DM address out of range, and the address out of range is happening in col 25 row 1.