Error Management

Versal Adaptive SoC Technical Reference Manual (AM011)

Document ID
AM011
Release Date
2023-10-05
Revision
1.6 English

Error management can include several areas.

  • Monitoring
  • Containment
  • Crash dump
  • Debugging
  • Recovery

See the Versal Adaptive SoC System Software Developers Guide (UG1304) for more error management information.

PL Soft Error Mitigation

The Versal adaptive SoC PMC hardware supports the ability to validate the integrity of the device configuration and perform readback of configuration data (in the background) using the Xilinx Soft Error Mitigation (XilSEM) library.

The XilSEM is a pre-configured, pre-verified solution to detect and optionally correct soft errors in the configuration memory. A soft error is caused by ionizing radiation and is extremely uncommon in commercial terrestrial operating environments. While a soft error does not damage the device, it carries a small statistical possibility of transiently altering the device behavior.

The XilSEM library does not prevent soft errors. However, it provides a method to better manage the possible system-level effect. Proper management of a soft error can increase reliability and availability, and reduce system maintenance and downtime. In most applications, soft errors can be ignored. In applications where a soft error cannot be ignored, see the BSP and Libraries Document Collection (UG643) for additional information about the XilSEM library prior to configuring it for use through the CIPS IP core.

Firmware Response to Errors

The PLM and PSM firmware response to errors can be configured by downloading a configuration data object (CDO) into the processor RAM. For example, a fatal NoC error can be programmed to trigger a device-level reset or just an interrupt to the PLM or PSM firmware.