Errors detected by a component in CMN are classified into three main categories as shown in the following table:
Error Type | Description | Examples | Action taken by hardware |
---|---|---|---|
Correctable Errors | Errors that can be corrected using ECC or other methods. |
|
1. Logs the error. 2. Counts the occurrence of these errors. 3. Signals the error to the global RAS block that can be controlled using a threshold count. |
Deferred Errors | Uncorrectable errors detected in one node of the CMN, but the data is not used within the same node, and poison bits are set for the data. With these errors, the system can typically operate for a period of time without being corrupted. | A request packet received with an unsupported opcode. |
1. Sends a response with a RespErr value of data error or non-data error. 2. Logs the error. 3. Signals error to the global RAS block within the CMN. |
Uncorrectable Fatal Errors | These are errors in the control logic in a node, where continuing operation might corrupt the system beyond recovery. |
|
1. Logs the error. 2. Signals the error to the global RAS block. |
The global RAS (reliability, availability, and serviceability) block signals these error interrupts which are handled by the CCIX firmware. The firmware is responsible for generating the protocol error reporting (PER) message packet, as appropriate. Details on PER generation are covered in CCIX Capable PCIe Controller.