The following steps describe the recovery process for a QP that entered the fatal condition. It describes how to clear the existing traffic on the QP and re-initialize it so that it can be reused.
Steps to clear traffic on QP:
1. On detecting QP under the FATAL interrupt, read the IPKTERRQBA register to know the Incoming Pkt Error Status Queue base address written by the driver.
2. Read this base address to know the QP FATAL status and decide which QP went into FATAL (Bits[31:16]) and check QP FATAL status code (Bits[15:0]), see Table: Decoding for FATAL Codes .
3. Stop pushing any further SQ PI doorbells.
4. Set the “QP under recovery” bit in the QPCONFi register to 1.
5. Read the STATQPi register to check “send Q empty” and “outstanding Q empty” bits to become 1.
6. Poll the CQHEADi register to check its value is the same as SQPIi register.
7. Poll the RESPHNDSTS register for “sq pici db check en” —16 th bit to be set.
8. Set the QPCONFi register “QP enable” bit to 0 and “QP under recovery” bit to 1.
Steps to reinitialize the QP:
1. Poll the CQHEADi register to check its value is the same as SQPIi register.
2. Poll the RESPHNDSTS register for “sq pici db check en”—16 th bit to be set.
3. Set the “SW OVERWRIDE” bit in the XRNICADCONF register to 1.
4. Initialize the following QP registers to 0:
° STATRQPIDBi
° STATRQBUFCAi
° STATRQBUFCAMSBi
° RQCIi
° STATCURSQPTRi
° SQPIi
° SQPSNi
° LSTRQREQi
° STATMSN
° CQHEADi
5. Poll the CQHEADi register to check its value is 0.
6. Initialize the following register with the new value:
° SQPSNi
° LSTRQREQi
7. Initialize the following Ethernet side registers:
° MACDESADDMSBi
° MACDESADDLSBi
° IPDESADDR1i
° IPDESADDR2i
° IPDESADDR3i
° IPDESADDR4i
8. Re-configure the IP version in the QPCONFi.
9. Re-initialize the “RNR nack count” and “retry count” in the STATQPi register.
10. Set the ACCESSDESC register “access type” for that QP to 'b10 .
11. Re-program the QPCONFi register by re-initializing fields like “RQ interrupt enable”, “CQ interrupt enable”, “PMTU”, and “HW handshake disable”. Selectively enable “CQE write enable” to debug error completions and re-initialize “RQ Buffer size”.
12. Re-program the QPADVCONFi register by randomizing “Traffic class”, “Time to live”, and “Partition key”.
13. Set “QP EN” to 1 and “QP under recovery” to 0 in QPCONFi register.
14. Clear the “‘SW OVERWRIDE” bit in the XRNICADCONF register to 0.