Error Injection Guidance - 3.1 English

UltraScale Architecture Soft Error Mitigation Controller LogiCORE IP Product Guide (PG187)

Document ID
PG187
Release Date
2023-11-08
Version
3.1 English

The SEM IP provides provision for error injection to support verification of the controller and evaluation of applications of the controller. Generally, the objective of the error injection exercise should drive the type of error injection that should be performed.

Objective: Basic Error Injection Testing

To perform this type of error injection, you are interested in achieving the following goals:

  • Confirm that the controller detects and corrects errors
  • Confirm that the system is logging errors that are detected and corrected as expected
  • Not interested in testing system behavior when error impacts design function

Location of error injections:

  • Inject errors in ECC word of a frame.
  • For AMD UltraScale™ architecture, the ECC word spans between word 60 and 61 of each frame. AMD recommends injecting errors in the lower byte of word 61.
  • For AMD UltraScale+™ FPGAs and MPSoCs, the ECC word spans between word 45 and 46 of each frame. AMD recommends injecting errors in the low byte of word 46. Many UltraScale+ devices contain configuration memory address ranges which are reserved for support of processing system integration (regardless of processing system presence or use). Error injected into these reserved configuration memory addresses are not detected. AMD recommends using linear frame addresses larger than 2/3 × Max Linear Frame as general guidance in selection of an address which maps to physically implemented configuration memory.

By injecting errors in this location, the injected error does not interfere with the controller or design function.

If the objective of error injection does not fall into this scope, open a support case for guidance.

The following lists general guidance for injecting errors using the Monitor or Command interface from the controller:

  • Perform error injection using Linear Frame Addresses (LFA). The valid range of address is from 0 to maximum LFA value –2. For example, the maximum frame for a XCKU040 device is 26,179. The valid range of addresses to inject errors are 0 to 26,177 (26,179 – 2). The maximum LFA value for a given device is recorded in Table 2 and this value is also reported by the controller in its status report (MF {8-digit hex value}).
  • Always perform a Query command before and after error injection to verify that the error injection was successful and the bit is not masked from reads and writes.
  • Inject one error at a time and confirm that the controller transitions to Injection state and returns to the Idle state again before performing the next injection. If the controller is not in Idle state, the error injection instruction can be dropped or lost.
  • AMD recommends using the Monitor or UART Interface to perform and monitor error injection because this interface provides the most verbose information.
  • When using the Command Interface, you must monitor the Status Interface to verify that the controller is in Idle before injecting any error and also to verify that controller transitions to error injection state after the command is given.
  • If a single bit error is injected and is not detected by the controller, inject an error to the same bit again before proceeding to inject an error in a different bit.
  • If injecting more than 1-bit at a time, you should back-out all the errors that are not corrected before performing another set of error injection. Alternatively, reprogram the device before resuming further error injection testing.

Special considerations when injecting errors:

  • Due to masked and not implemented frames, there is a possibility that an error injected is not detected (hence not corrected). Performing a Query command before and after error injection gives insight on what to expect.
  • Usually frames that are masked are related to dynamic memory (DRP, SRL, etc.).
  • If an injected error caused the controller to report an uncorrectable error, reconfigure before doing anymore testing.