The AI Engine errors events that are considered errors to generate interrupt are configured at compilation during CDO generation. At runtime, the AI Engine SSW software stack only reports errors but does not change the errors interrupt configuration. In the case of Linux flow, the AI Engine kernel driver notifies the XRT kernel driver about the errors happening by calling the XRT registered callback.
User space Error APIs
In general, these APIs are exposed through XRT where
the xrt::error
class (the class to retrieve the
asynchronous errors in the host code) and its member functions are provided to
retrieve the asynchronous errors into the user space host code. This helps to debug
when something goes wrong. The API details are as follows:
-
error::get_error_code()
- Gets the last error code and its timestamp.
-
error::get_timestamp()
- Gets the timestamp of the last error.
-
error::to_string()
- Gets the description string of a given error code.
graph.run(runInteration);
try {
graph.wait(timeout);
}
catch (const std::system_error& ex) {
if (ex.code().value() == ETIME) {
xrt::error error(device, XRT_ERROR_CLASS_AIE);
auto errCode = error.get_error_code();
auto timestamp = error.get_timestamp();
auto err_str = error.to_string();
/* code to deal with this specific error */
std::cout << err_str << std::endl;
} else {
/* Something else */
}
}
aie aie0
asserted tile
error event 60 at col 25 row 1. The previous figure shows an error propagated from the
AI Engine array and is
used to debug the application-specific errors. For the list of error events, see AI
Engine Error Events in the
AI
Engine Tools and Flows User Guide (UG1076).
Notice the error event 60 above, which represents the DM address out of range, and the
address out of range is happening in col 25 row 1.