AI Engine Error/Events Routing Configuration - 2024.1 English - UG1642

AI Engine System Software Driver Reference Manual (UG1642)

Document ID
UG1642
Release Date
2024-05-30
Version
2024.1 English

This section describes the AI Engine error routing configuration. As the AI Engine only has limited broadcast events, there are a limited numbers of events that can be broadcasted. Local events can be broadcasted to adjacent modules through broadcast events to trigger the control or debug operations. For example, triggering core reset by a certain event or triggering debug halt by another event. Events can also be routed to external components through broadcast events. Events are aggregated into broadcast event signals. Each of these signals can only be configured to broadcast one local event of a module. However, by configuring group events or combo events, one broadcast event can be shared by several local events. The following table describes the broadcast event arrangement.

Table 1. AI Engine Broadcast Events Table
Broadcast Events Description
0 Used by the driver to manage errors events
1–7 Used by the driver for the driver
8–15 Used by the application for application control events (for example, events for tracing or debugging)

Error/Events handling is not enabled by default, aiecompiler or user applications need to enable it for use. The following is the error/events configuration sequence:

  1. By default, the AI Engine driver uses the broadcast events as described in the previous table.
  2. Call the AI Engine driver API to initialize the AI Engine events handling. This function call sets up the broadcast network routing and does the following:
    • Blocks the east/west broadcasting for the broadcast events used for errors and notification events. The error and notification broadcast events only go vertically.
    • Configures the AI Engine interface tiles first level interrupt controller to capture the broadcast events for error and notification events.
    • Routes the events out from the AI Engine interface tiles first level interrupt controller of each column to an AI Engine interface tiles second level interrupt controller in the partition as shown in the following table.
  3. Set the AI Engine interface tiles second level interrupt controller to output to AI Engine NPI interrupt.
  4. Configure AI Engine NPI interrupt.
  5. Setup broadcast events to broadcast errors for all tiles.

After the AI Engine events broadcasting network is configured, AI Engine interrupt routes to the AI Engine driver once the configured events happen, the AI Engine driver back traces to see which events caused the interrupts. The following table shows the AI Engine errors. The following table is an example of AI Engine error grouping.

Table 2. AI Engine Errors Grouping
Module Errors Group Errors Not Errors
Core Module FP Errors
  • FP Invalid
  • FP Div by Zero
  • FP Overflow
  • FP Underflow
Saturation Errors  
  • SRS Saturate (can happen very often)
  • UPS Saturate (can happen very often)
Instruction Errors
  • Instruction Decompression Error
  • Instr Warning(Instr Event2)
  • Instr Error(Instr Event3
 
Access Errors
  • PM address out of range
  • PM Reg Access Failure
  • DM address out of range
  • DM access to unavailable
 
Bus Errors AXI4 Slave Error  
Stream Errors
  • TLAST in WSS words 0-2
  • Stream Pkt Parity Error
  • Control Pkt Error
 
Lock Errors Lock Access to unavailable  
ECC Errors
  • PM ECC Error 2bit
  • PM ECC Error Scrub 2bit
  • PM ECC Error Scrub Corrected (not error)
  • PM ECC Error 1bit (not error)
Memory Module Memory Parity Errors
  • DM Parity Error Bank 2
  • DM Parity Error Bank 3
  • DM Parity Error Bank 4
  • DM Parity Error Bank 5
  • DM Parity Error Bank 6
  • DM Parity Error Bank 7
 
DMA Errors
  • DMA S2MM 0 Error
  • DMA S2MM 1 Error
  • DMA MM2S 0 Error
  • DMA MM2S 1 Error
 
ECC Errors DM ECC Error 2bit
  • DM ECC Error Scrub Corrected(not error)
  • DM ECC Error 1bit(not error)
AI Engine interface tiles Bus Errors
  • AXI4 Slave Tile Error
  • AXI4 Decode NSU Error
  • AXI4 Slave NSU Error
  • AXI4 Unsupported Traffic
  • AXI4 Unsecure Access in Secure Mode
  • AXI4 Byte Strobe Error
 
Stream Error Control Pkt Error  
DMA Error
  • DMA S2MM 0 Error
  • DMA S2MM 1 Error
  • DMA MM2S 0 Error
  • DMA MM2S 1 Error
 
System Level (PLM)

Global Errors

  • PLL Lock Loss
  • Scan Clear Error
  • Critical Temperature (System reset)