The Xilinx Soft Error Mitigation (XilSEM) library is a pre-configured, pre-verified solution to detect and optionally correct soft errors in Configuration Memory of Versal ACAPs. A soft error is caused by ionizing radiation and is extremely uncommon in commercial terrestrial operating environments. While a soft error does not damage the device, it carries a small statistical possibility of transiently altering the device behavior.
The XilSEM library does not prevent soft errors; however, it provides a method to better manage the possible system-level effect. Proper management of a soft error can increase reliability and availability, and reduce system maintenance and downtime. In most applications, soft errors can be ignored. In applications where a soft error cannot be ignored, this library documentation provides information to configure the XilSEM library through the CIPS IP core and interact with the XilSEM library at runtime.
The XilSEM library is part of the Platform Loader and Manager (PLM) which is loaded into and runs on the Platform Management Controller (PMC).
When a soft error occurs, one or more bits of information are corrupted. The information affected might be in the device Configuration Memory (which determines the intended behavior of the design) or might be in design memory elements (which determine the state of the design). In the Versal architecture, Configuration Memory includes Configuration RAM and NPI Registers. Configuration RAM is associated with the Programmable Logic (PL) and is used to configure the function of the design loaded into the PL. This includes function block behavior and function block connectivity. This memory is physically distributed across the PL and is the larger category of Configuration Memory by bit count. Only a fraction of these bits are essential to the correct operation of the associated resources for the currently loaded design.
NPI Registers are associated with other function blocks which can be configured independently from the PL. The Versal Architecture integrated shell is a notable application of function blocks configured in this manner, making it possible for the integrated shell to be operational with the PL in an unconfigured state. This memory is physically distributed throughout the device and is the smaller category of Configuration Memory by bit count. Only a fraction of these bits are essential to the correct operation of the associated resources for the currently loaded design.
Prior to configuring the XilSEM library for use through the CIPS IP core, assess whether the XilSEM library helps meet the soft error mitigation goals of your system’s deployment and what mitigation approaches should be included, if any. There are two main considerations:
- Understanding the soft error mitigation requirements for your system
- What design and system actions need to be taken if a soft error occurs
If the effects of soft errors are a concern for your system’s deployment, each component in the system design will usually need a soft error rate (SER) budget distributed from a system-level SER budget. To make estimates for the contribution from a Versal ACAP, use the Xilinx SEU Estimator tool, which is tentatively planned for availability after the UG116 publication of production qualification data for Xilinx 7 nm technology. To estimate SER for a deployment, at the minimum you need:
- Target device selection(s) to be deployed
- Estimated number of devices in deployment
In addition to providing a device level estimate, the Xilinx SEU Estimator also estimates the number of soft errors statistically likely to occur across the deployment for a selected operation time interval. After an estimate has been generated, it must be used to evaluate if the SER requirement for the deployment is met. Other design approaches might need to be implemented to reduce the system-level SER. These approaches include addition of mitigation solutions in the system, which might be suitable for implementation in any devices in the system which can offer such features. Versal ACAPs offer many supporting features, including the XilSEM library. The XilSEM library can be configured to detect or detect and correct soft errors. Therefore, consideration must be given to the system-level response, if any, that must be taken in response to a soft error in the Versal ACAP. If no action is taken when a soft error is detected in the Versal ACAP, the benefits of using the XilSEM library should be assessed and understood. While this is a valid use case, the effects of soft errors in the Versal ACAP should be anticipated and this approach should be reviewed to ensure it supports the system-level soft error mitigation strategy.
If there is no SER target at the system level, then it is unclear what system changes (and tradeoffs that come with them) need to be made to mitigate soft errors, including whether a deployment benefits from use of the XilSEM library.