Datapath Analysis and First Principles

Replacing FMEA with Datapath Analysis for IP Designs (WP545)

Document ID
WP545
Release Date
2023-06-14
Revision
1.0 English

To answer the question regarding random hardware failure rates, AMD products need to be analyzed for random hardware failure susceptibility and hardware failure rates. For components to qualify for functional safety application, there is a requirement for the system to detect a failure. This detection method can be implemented by the system integrator (for example, using a redundant device) or can be implemented in the device (for example, parity) by designers. The system integrator (customer) is required to analyze the quality of the diagnostic elements with respect to their ability to detect and report a fault.

For business reasons, integrating diagnostics on chip and the associated analysis provides value to customers. This value and the associated analysis drives datapath analysis. For the most part, AMD products are designed to cover multiple use cases or out of context. In all applications, their intended use case is processing data either by hardware implemented datapaths or by using instructions to drive CPUs to process data and make decisions. The key to datapath analysis is understanding that data processing can be broken down into these four actions or first principles:

  • Data transformation
  • Data analysis
  • Data transportation
  • Data storage

The IP designer uses these four first principles to determine what diagnostic is required to detect unintended or dangerous operation of the component for functional safety applications.

In the case of data transformation or data analysis (for example, a floating-point operation fails or a compare zero fails), if the failure modes are transient in nature, then temporal diversity, that is, executing the operation twice using the same hardware and comparing the results, provides acceptable diagnostic coverage. If the failure modes are permanent in nature, then redundancy or executing the same operation using different hardware and comparing the result, is the required diagnostic.

In the case of data transportation where a data packet is corrupted when moving from one location to another, for both transient and permanent failure modes, check bits are used to validate the data packet. There are numerous check bit techniques from cyclic redundancy check to forward error correction where the error in a data packet can be detected and, in some cases, corrected.

In the case of data storage where data is at rest, devices holding data are exposed to single event upsets for a much longer period of time. This exposure is a unique problem in data processing. For both transient and permanent errors check, bits are used to validate the correct address of that data and sometimes to correct the stored data. For transient errors, patrolling data (reading, correcting, and updating) is used to mitigate error accumulation.

Datapath analysis is valuable because the designer does not need to have in-depth knowledge of the functional safety standard to generate the required dataset used by the system integrator. The focus of the designer moves from concerns regarding standards implementation issues to functional design plus diagnostic coverage.

This systematic approach is used to break down a design into a set of elements (buses and functions), based on the first principles, which follows the flow of data from top down recursively through a hierarchy.