Types of Errors Reference - AM011

Versal Adaptive SoC Technical Reference Manual (AM011)

Document ID
AM011
Release Date
2025-03-11
Revision
1.7 English

There are many types and sources of system errors as listed in this section. All errors are listed in the System Error Summary Tables.

Processor Errors

There are several processor system errors.

  • RCU and PPU triple module redundancy errors
  • RPU lock-step errors
  • APU_GIC_ECC errors

RAM Array Errors

  • OCM memory
  • RPU TCM memory
  • APU system cache non-correctable memory
  • XRAM correctable and non-correctable memory
  • Embedded memory for DMA controller, I/O peripheral buffers

Interconnect Errors

  • APB programming interface access errors
  • Interconnect parity errors
  • Interface timeout errors

System Watchdog Timer Errors

The system watchdog timer (SWDT) expects certain responses from software based on timed windows and system interrupts. If the timer determines that the system has a serious problem, it asserts a system error that is routed to the EAM and handled by the firmware. The firmware can choose to reset part of the system, all of the system, or take other action.

Software Generated Errors

  • PS software correctable and non-correctable error
  • PSM firmware correctable and non-correctable error

Functional Safety Errors

A safety error occurs when logic or a memory cell changes state due to a physical anomaly. The system can detect these anomalies. When a safety error occurs, it is important to ensure that the system remains in a safe state. This can include any of a number of actions. Broadly, responses fall into two categories.

Correctable Error
A bit error is detected and corrected, usually by the hardware. The event is recorded and an interrupt is signaled.
Note: The typical response is for the PLM firmware to report the event to the system safety software so it can be monitored and analyzed.
Uncorrectable Error
An error that is detected but cannot be corrected. The event is recorded and an interrupt is signaled.
Note: The typical response is for the PLM firmware to indicate that a system-level intervention is required, which might include a partial or complete system reset.

Security Errors

For security errors, see the Versal Adaptive SoC Security Manual (UG1508) . When a security error is detected, the system usually responds with a secure lockdown and zeroization of key system elements before a reset restart is issued.