The AI Engine errors events that are considered errors to generate interrupt are configured at compilation during CDO generation. At runtime, software stack only reports errors but does not change the errors interrupt configuration. In the case of XRT flow, AI Engine kernel driver notifies the XRT kernel driver about the errors happening by calling the XRT registered callback. The XRT kernel driver handles all the errors. XRT software stack needs to provide APIs for high level libraries or applications to inquire about graph errors. After the AI Engine loads, errors can happen. If no application requests the partition when errors have happened, the errors are not cleared. It notifies XRT later when XRT registers for the error’s callback.
In cases where there is no XRT in the flow, userspace libraries handle errors. AI Engine file descriptor are used to poll errors. The AI Engine
embeddedsw
driver enables error notification by polling
the AI Engine file descriptor.
The AI Engine
embeddedsw
driver provides a wrapper API for you to
poll. AMD removed signaling applications with errors and
error callback registration from the application so that there is no need to spawn a
thread to monitor errors. The AI Engine
embeddedsw
driver provides APIs for you to get the
details of the groups of errors that happened and the details of the errors. When errors
happen, the application is expected to reset and restart to recover from the errors.