AI Engine Status Dump and Errors Statistics - 2024.1 English - UG1642

AI Engine System Software Driver Reference Manual (UG1642)

Document ID
UG1642
Release Date
2024-05-30
Version
2024.1 English

The AI Engine kernel partition driver provides the ability to inquire about device errors through the sysfs interface and provides the main status and PC registers to help you debug runtime issues. The sysfs entries dump the errors and status registers in readable, scripting friendly formats. Also, for every tile there are sysfs entries to show the core, DMA and locks status, and PC registers. This provides help in debugging Linux runtime issues. AI Engine kernel driver also provides sysfs entry for core dump and errors. At runtime, you or the runtime utilities can check the sysfs to see all errors that have occurred for the application. The errors are in a readable format and easy for scripting.

The sysfs core dump entries read the core, DMA and locks status, and PC registers from the hardware and show the value in a readable and script friendly format. If the application stalls at runtime, such as if there is no output from the AI Engine, sysfs shows where it is stalling. Runtime utilities or offline tools can make use of the core dump data with the AI Engine compiler generated graph information to debug the application with more details. The following shows the sysfs entries structure:

/sys/class/aie/aiepart_<startcol_numcols>/
|-- <col_row>
| |-- core - For AIE array tiles only.
| |-- dma - For NoC tiles and array tiles only.
| |-- error
| |-- event
| `-- lock - For NoC tiles and array tiles only.
.
.
.
|-- core
|-- dma
|-- error
|-- error_stat
|-- lock
`-- status

The following is an example output for AI Engine partition status. At the aperture level there is one hardware_info node to show device generation, number of row/columns, and tile types.

# Cat out info from sysfs node
xilinx-vek280-es1-20231:~$ cat /sys/class/aie/aieaperture_0_38/hardware_info
generation: aieml
total_cols: 38
total_rows: 11
shim_tile: start row: 0, num_rows: 1
memory_tile: start row: 1, num_rows: 2
aie_tile: start row: 3, num_rows: 8

The format of the output is device generation independent.