This section provides error and related debug information for the errors obtained using the XRT error reporting APIs described previously. These are errors propagated from the AI Engine array and can be used to debug application specific errors in hardware.
For errors with class XRT_ERROR_CLASS_AIE, as found in https://github.com/Xilinx/XRT/blob/master/src/runtime_src/core/include/xrt_error_code.h,
you can obtain additional information by enabling the dmesg
logs, which provide the causes of the error (and are described
in the following tables). An example log is shown here:
[18.462615] aie aie0: Asserted tile error event 56 at col 6 row 7
[18.471397] aie aie0: Asserted tile error event 60 at col 25 row 1
col
and row
number. Row 0 is the SHIM (interface) tile,
AI Engines start from row 1.The following tables list the various categories of error, in addition to the exact error number, description, and tips on the next steps to debug and resolve the errors.
Error Group | No. | Name | Description | Debug Tips |
---|---|---|---|---|
Instruction Errors | 59 | Instruction Decompression Error | Event generated when AI Engine cannot decompress instruction fetched. This can happen if the program instructions are corrupt. Validate ELF generation. | Regenerate the ELF file with the Vitis compiler (V++) --package command. If the issue
persists, contact AMD
support. |
Access Errors | 55 | PM Reg Access Failure | This error can happen on bank access conflict to PM by the memory mapped AXI interface and AI Engine. | Contact AMD support. |
60 | DM address out of range | Event generated if AI Engine tries to access a memory location outside of 0x20000 – 0x3FFFF. | Run AI Engine
simulator (aiesimulator ) with
–-enable-memory-check that
will flag any access violations. Alternatively
run |
|
65 | PM address out of range | Event generated if PC is out of range | Run AI Engine
simulator (aiesimulator ) with
– enable-memory-check that
will flag any access violations. Alternatively run x86simulator with --valgrind that will flag any access
violations. |
|
66 | DM access to unavailable | Event generated if AI Engine issues an access to the isolated tile in neighborhood. | Check if the kernel runs on AI Engine accesses data memory of an
isolated tile (a different partition). If the issue persists, contact AMD support. |
|
Bus Errors | 58 | AXI MM Slave Error | Event generated if the memory mapped AXI interface slave read/write request is for an address which does not exist in the AI Engine tile. | If the PL IP is accessing the AI Engine registers using the memory
mapped AXI interface, check the PL IP to see if it access invalid
registers. If the issue persists, contact AMD support. |
Stream Errors | 54 | TLAST in WSS words 0-2 | Event generated if TLAST is not on the fourth word of a wide stream. | If PL IP is used to generate the stream, check
if it generates TLAST correctly. If the issue persists, contact AMD support. |
56 | Stream Pkt Parity Error |
Event generated if there is any parity error in the packet header. |
Check the data source such as PL IP which generates the packets to see if the packet is valid and if the parity bit is correctly calculated. If the data is from PL IP, check the packet header generated from the PL IP. | |
57 | Control Pkt Error | Control Packet Error | Check the data source, such as PL IP which
generates the packets to see if it generates the packets
correctly. If the issue persists, contact AMD support. |
|
ECC Errors | 64 | PM ECC Error 2bit | Event generated when 2 bit ECC error is detected | Re-run the application. If the issue persists, contact AMD support. |
62 | PM ECC Error Scrub 2bit | Event generated if ECC scrubber detects 2 Bit ECC error | Re-run the application. If the issue persists, contact AMD support. |
|
Lock Errors | 67 | Lock Access to unavailable | Event generated if AI Engine issues an access to the isolated tile in neighborhood. | Run AI Engine simulator (aiesimulator ) with
–-enable-memory-check that
will flag any access violations. If the issue persists, contact
AMD support.
Alternatively run x86simulator
with --valgrind that will flag any
access violations. |
|
Errors Group | No. | Name | Description | Debug Tips |
---|---|---|---|---|
ECC Errors | 88 | DM ECC Error Scrub 2bit | Event generated when ECC scrubber detects 2-bit ECC error in bank 0 or bank 1 of DM. | Re-run the application. If the issue persists, contact AMD support. |
90 | DM ECC Error 2bit | Event generated when 2-bit ECC error is detected during access to bank 0 or 1 of DM. This data memory ECC error can be caused by DM access from the AI Engine, tile DMA, or memory mapped AXI interface. | Re-run the application. If the issue persists, contact AMD support. |
|
Memory Parity Errors | 91 | DM Parity Error Bank 2 | Event generated when a parity error is detected
during access to DM bank 2. This data memory parity error can be caused by DM access from the AI Engine, tile DMA, or memory mapped AXI interface. |
Re-run the application. If the issue persists, contact AMD support. |
92 | DM Parity Error Bank 3 | Event generated when a parity error is detected
during access to DM bank 3. This data memory parity error can be caused by DM access from the AI Engine, tile DMA, or memory mapped AXI interface. |
Re-run the application. If the issue persists, contact AMD support. |
|
93 | DM Parity Error Bank 4 | Event generated when a parity error is detected
during access to DM bank 4. This data memory parity error can be caused by DM access from the AI Engine, tile DMA, or memory mapped AXI interface. |
Re-run the application. If the issue persists, contact AMD support. |
|
94 | DM Parity Error Bank 5 | Event generated when a parity error is detected
during access to DM bank 5. This data memory parity error can be caused by DM access from the AI Engine, tile DMA, or memory mapped AXI interface. |
Re-run the application. If the issue persists, contact AMD support. |
|
95 | DM Parity Error Bank 6 | Event generated when a parity error is detected
during access to DM bank 6. This data memory parity error can be caused by DM access from the AI Engine, tile DMA, or memory mapped AXI interface. |
Re-run the application. If the issue persists, contact AMD support. |
|
96 | DM Parity Error Bank 7 | Event generated when a parity error is detected
during access to DM bank 7. This data memory parity error can be caused by DM access from the AI Engine, tile DMA, or memory mapped AXI interface. |
Re-run the application. If the issue persists, contact AMD support. |
|
DMA Errors | 97 | DMA S2MM 0 Error | This error can be caused by writing to the BD task queue of S2MM channel 0 when it is full. | If you manage buffer descriptors in your
application, verify that you are not pushing new buffer descriptors
when the queue is full. If the issue persists, contact AMD support. |
98 | DMA S2MM 1 Error | This error can be caused by writing to the BD task queue of S2MM channel 1 when it is full. | If you manage buffer descriptors in your
application, verify that you are not pushing new buffer descriptors
when the queue is full. If the issue persists, contact AMD support. |
|
99 | DMA MM2S 0 Error | This error can be caused by writing to the BD task queue of MM2S channel 0 when it is full. | If you manage buffer descriptors in your
application, verify that you are not pushing new buffer descriptors
when the queue is full. If the issue persists, contact AMD support. |
|
100 | DMA MM2S 1 Error |
This error can be caused by writing to the BD task queue of MM2S channel 1 when it is full. |
If you manage buffer descriptors in your
application, verify that you are not pushing new buffer descriptors
when the queue is full. If the issue persists, contact AMD support. |
Error Group | No. | Name | Description | Debug Tips |
---|---|---|---|---|
Bus Errors | 62 | AXI MM Slave Tile Error | Event generated if a memory mapped AXI interface slave request comes to an interface tile but the address is invalid. | If using the PL IP to access the AI Engine register with the memory
mapped AXI interface, check if the IP tries to access the wrong
address. If the issue persists, contact AMD support. |
64 | AXI MM Decode NSU Error | The memory mapped AXI interface traffic internally has responded with a DECERR. For example, if a column, set of tiles are clock gated, a decode error is generated internally and travels on the memory mapped AXI interface to the interface tile to generate this event. | If using the PL IP to access the AI Engine register using the memory
mapped AXI interface, check if the IP tries to access tile which is
gated. If the issue persists, contact AMD support. |
|
65 | AXI MM Slave NSU Error | The memory mapped AXI interface traffic internally has responded with a SLVERR. For example, an AI Engine tile in that interface tile column has responded with a slave error. That slave error will travel over the memory mapped AXI interface to the interface tile as a slave error. | If using the PL IP to access the AI Engine register with the memory
mapped AXI interface, check if the IP tries to access wrong address.
If the issue persists, contact AMD support. |
|
66 | AXI MM Unsupported Traffic | The memory mapped AXI interface from the NoC has made a request that the AI Engine does not support. | If using the PL IP to access the AI Engine register with the memory mapped AXI interface, check if the IP generates unsupported memory mapped AXI interface requests. | |
67 | AXI MM Unsecure Access in Secure Mode | The memory mapped AXI interface from the NoC is violating the secure mode (trying to route unsecured traffic when AI Engine only supports secure traffic). | Check if the AI Engine array is configured in secure mode. | |
68 | AXI MM Byte Strobe Error | The memory mapped AXI interface from the NoC is writing with non-complete 32-bit words (within a 32- bit word all byte strobes must be set). | If the PL IP is accessing the AI Engine using the memory mapped AXI interface, check if all byte strobes are set for a 32-bit word. | |
Stream Error | 63 | Control Pkt Error | Control Packet Error | If the PL IP is generating the control packets,
check if the IP generates packets properly. If the issue persists, contact AMD support. |
DMA Error | 69 | DMA S2MM 0 Error | This DMA error is for DMA S2MM channel 0. It can
be caused by:
|
If you manage buffer descriptors in your
application, verify that you are not pushing new buffer descriptors
when the queue is full. If you manage buffer descriptors in your application, check if the memory address sent to the interface tile DMA buffer descriptor is invalid. If the issue persists, contact AMD support. |
70 | DMA S2MM 1 Error | This DMA error is for DMA S2MM channel 1. It can
be caused by:
|
If you manage buffer descriptors in your
application, verify that you are not pushing new buffer descriptors
when the queue is full. If you manage buffer descriptors in your application, check if memory address sent to the interface tile DMA buffer descriptor is invalid. If the issue persists, contact AMD support. |
|
71 | DMA MM2S 0 Error | This DMA error is for DMA MM2S channel 0. It can
be caused by:
|
If you manage buffer descriptors in your
application, verify that you are not pushing new buffer descriptors
when the queue is full. If you manage buffer descriptors in your application, check if memory address sent to the interface tile DMA buffer descriptor is invalid. If the issue persists, contact AMD support. |
|
72 | DMA MM2S 1 Error | This DMA error is for DMA MM2S channel 1. It can
be caused by:
|
If you manage buffer descriptors in your
application, verify that you are not pushing new buffer descriptors
when the queue is full. If you manage buffer descriptors in your application, check if memory address sent to the interface tile DMA buffer descriptor is invalid. If the issue persists, contact AMD support. |
|
|