Types of Errors Reference

Versal Adaptive SoC Technical Reference Manual (AM011)

Document ID
AM011
Release Date
2023-10-05
Revision
1.6 English

There are many types and sources of system errors. The errors are grouped below. All errors are listed in the System Error Summary Tables.

Processor Errors

There are several processor system errors.

  • RCU and PPU triple module redundancy errors
  • RPU lock-step errors
  • APU_GIC_ECC errors

RAM Array Errors

  • OCM memory
  • RPU TCM memory
  • APU system cache non-correctable memory
  • XRAM correctable and non-correctable memory
  • Embedded memory for DMA controller, I/O peripheral buffers

Interconnect Errors

  • APB programming interface access errors
  • Interconnect parity errors
  • Interface timeout errors

System Watchdog Timer Errors

The system watchdog timer (SWDT) expects certain responses from software based on timed windows and system interrupts. If the timer determines that the system has a serious problem, it asserts a system error that is routed to the PSM EAM and handled by the PSM firmware. The firmware can choose to reset part of the system, all of the system, or take other action.

Software Generated Errors

  • PS software correctable error
  • PS software non-correctable error
  • PSM firmware program correctable error
  • PSM firmware program non-correctable error
  • PSM hardware correctable error
  • PSM hardware non-correctable error

Functional Safety Errors

A safety error occurs when logic or a memory cell changes state due to a physical anomaly. The system can detect these anomalies. When a safety error occurs, it is important to ensure that the system remains in a safe state. This can include any of a number of actions. Broadly, responses fall into two categories.

Correctable Error
A bit error is detected and corrected, usually by the hardware. The event is recorded and an interrupt is signaled.
Note: The typical response is for the platform loader and manager (PLM) to report the event to the system safety software so it can be monitored and analyzed.
Uncorrectable Error
An error that is detected but cannot be corrected. The event is recorded and an interrupt is signaled.
Note: The typical response is for the PLM to indicate that a system-level intervention is required, which might include a partial or complete system reset.

Security Errors

A security error occurs when a secure asset is exposed. When a security error is detected, the system usually responds with a secure lockdown and zeroization of key system elements before a reset restart is issued.