Overview - Overview - 5.0 English - PG319

Semi-Ternary CAM Search v5.0 LogiCORE IP Product Guide (PG319)

Document ID
PG319
Release Date
2025-11-26
Version
5.0 English

The Semi-Ternary CAM Search IP core (STCAM) is a member of the family of CAMs provided by AMD. The family consists of the following members:

Binary CAM (BCAM)
Used for exact matching, BCAM is available in two versions. A software-managed version and a hardware-managed version (CBCAM). CBCAM offers you the flexibility to insert or delete entries using a hardware interface with or without a software driver. For more information, see Binary CAM Search LogiCORE IP Product Guide (PG317).
Cached DRAM Binary CAM (CDBCAM)
Used for exact matching, CDBCAM is similar to BCAM except that it uses DRAM as the primary storage for entries, whereas BCAM uses URAM or block RAM (BRAM). CDBCAM can store more entries, and, in combination with its on-chip BCAM cache, it can achieve lookup rates comparable to the BCAM. Similar to the BCAM, the CDBCAM supports both a software managed and hardware managed interface. For further information, see Cached DRAM Binary CAM LogiCORE IP Product Guide (PG427).
Semi TCAM (STCAM)
Described in this document. It is also available in two versions, one with fixed rate and latency and the other with variable rate and latency for low-cost applications. The low-cost version supports ranges to avoid costly entry explosion to cover ranges. The STCAM is fully flexible in terms of number, size and position of wildcard (ignored) fields. Every key bit has a corresponding mask bit. The number of allowed unique masks is however limited. This allows for considerable memory and logic optimizations. For LPM applications, the LPM mode of the variable rate STCAM uses special HW to compress keys which improves storage efficiency. The LPM mode uses prefix masks instead of fully flexible masks which allow for more masks at a lower cost.
DRAM Semi-Ternary CAM (DSTCAM)
DSTCAM is similar to STCAM except that it uses DRAM for storage of entries. The DSTCAN is flexible in terms of number, size, and position of wildcard fields. Every key bit has a corresponding mask bit. The number of allowed masks is however limited. This allows for considerable memory and logic optimizations. For LPM applications, the DSTCAM uses special hardware to compress keys which improves storage efficiency. The LPM mode uses prefix masks rather than fully flexible masks that reduces cost. See the DRAM Semi-Ternary CAM LogiCORE IP Product Guide (PG468).
Ternary CAM (TCAM)
The primary usage of TCAM is tables requiring full flexibility in terms of size and position of wildcard (ignored) fields. Every key bit has a corresponding mask bit stored together with the key. All entries can have different masks. TCAMs are used for Access Control List (ACL) type of lookups, requiring a large number of different masks. See the Ternary CAM Search LogiCORE IP Product Guide (PG318).

One or multiple instances of each type can be used inside the same FPGA. Different types can also be mixed inside the same FPGA. Each CAM type is optimized for its specific task in terms of hardware resource usage.

The STCAM stores (key, mask, priority, response) entries in either UltraRAM (URAM) or block RAM.

The Lookup interface of the STCAM receives a lookup key and outputs a result that contains a match flag indicating whether the masked lookup key matches the masked key of any entry in the STCAM. The width of the mask is the same as the key width. A cleared mask bit invalidates the corresponding key bit and ignores it. Both the lookup key and the stored key are bit-wise ANDed with the mask prior to the bit-wise matching. The STCAM is pipelined so that it can process a Lookup Request every clock cycle.

If multiple entries are matched, the response value of the matching entry with the lowest priority is output. If two entries have the same priority, one of them is arbitrarily picked as winner. The API software ensures that two entries with the same masked key value can not be inserted.

The entries are read and written using a driver consisting of a set of high-level API functions. The API functions are written in C and delivered as part of the IP. The API encapsulates the details of memory management and register access and provide a simple and efficient management interface. The API software with detailed documentation is found on the CAM IP product page. You must add a basic hardware read and write functions to the end of the driver file. This allows for flexible hardware mapping and the communications link between the API software and the hardware is designed to the users' specifications. The communication link could for instance be AXI4-Lite or PCIe® .

Following are the main functions of the STCAM API:

  • stcam_create
  • stcam_destroy
  • stcam_insert
  • stcam_delete
  • stcam_update

Arguments for stcam_insert, stcam_delete, stcam_update are:

  • key
  • mask
  • priority
  • response

LPM mode uses a simplified API using prefix length instead of the mask and priority arguments. For LPM mode, the main functions are:

  • lpm_create
  • lpm_destroy
  • lpm_insert
  • lpm_delete
  • lpm_update

The driver maintains a CPU shadow of the CAM Database. This way high latency read operations to the CAM Database from the CPU is eliminated. It also allows for testing the interaction between control plane and driver without any hardware.

The STCAM design is highly configurable at compile time to make it suitable for a large variety of applications. For LPM mode most configuration parameters can be omitted. The reduced set of parameters can be found in Table 2. The following table lists the configuration parameters.
Table 1. Configuration Parameters
Parameter Name Valid Range Description
KEY_WIDTH 10–992 bits

The width of the lookup key.

KEY_WIDTH + RESPONSE_WIDTH + PRIORITY_WIDTH + CTRL cannot exceed 2048

CTRL = 1 (VARIABLE_RATE = FALSE)

CTRL = up to 11 + total width of ranges (VARIABLE_RATE = TRUE)

RESPONSE_WIDTH 1–1024 bits

The width of the lookup response.

KEY_WIDTH + RESPONSE_WIDTH + PRIORITY_WIDTH + CTRL cannot exceed 2048

CTRL = 1 (VARIABLE_RATE = FALSE)

CTRL = up to 11 + total width of ranges (VARIABLE_RATE = TRUE)

VARIABLE_RATE TRUE/FALSE
VARIABLE_RATE = FALSE
Fixed lookup rate and latency.
VARIABLE_RATE = TRUE
Cost optimized with variable lookup rate and latency. Only the variable rate version supports range matching.
FORMAT_STRING NA Range matching requires a format string. The key width is derived from the format string and does not need to be specified. For more information, see section Format String in Designing with the Core.
PRIORITY_WIDTH 0–32 bits The width of the priority assigned to each entry.
NUM_MASKS

1–256 (VARIABLE_RATE = FALSE)

1–1024 (VARIABLE_RATE = TRUE)

The number of unique masks. The CAM compiler generates a CAM supporting both the specified number of unique masks and the specified number of entries at the same time.

NUM_MASKS can be omitted if a format string is used. If omitted, NUM_MASKS is set to a default value based on conservative ACLs using NUM_ENTRIES rules.

NUM_ENTRIES 1–1M The supported number of entries (depth). The CAM compiler generates a CAM supporting both the specified number of unique masks and the specified number of entries at the same time.
MEMORY_PRIMITIVE BLOCK or ULTRA or AUTO The compiler selects the best suited type automatically. This can however be overridden as a user preference.
LOOKUP_RATE 1–600 Mlps This is the supported lookup rate of the instance (expressed in million lookups per second). To save resources it is important not to set the lookup rate higher than required.

For VARIABLE_RATE = TRUE, the specified LOOKUP_RATE is sustained if the following conditions are met for the lookup key:

Max 2 overlapping rules on average for TDM_FACTOR = 1

Max TDM_FACTOR overlapping rules on average for TDM_FACTOR > 1

LOOKUP_INTERFACE_FREQ 1–600 MHz This is the clock frequency of the Lookup Request and response interfaces.

LOOKUP_INTERFACE_FREQ >= LOOKUP_RATE

RAM_FREQ 1–600 MHz This is the clock frequency of the memories and the internal datapath. An optional, high frequency RAM clock enables time division of the hardware resources, leading to significant savings. See the TDM_FACTOR parameter.
TDM_FACTOR 1–256 The TDM_FACTOR is calculated as:

RAM_FREQ / LOOKUP_RATE

The ratio is rounded downwards to the nearest power of two.

Example:

RAM clock frequency = 600, Lookup rate = 150 → TDM_FACTOR = 600 / 150 = 4

The RAM can be accessed four times per lookup, saving up to four times the RAM and logic resources for small table configurations.

CLOCKING_MODE SINGLE_CLOCK or DUAL_CLOCK The use of a separate RAM clock is optional. If RAM_FREQ = LOOKUP_INTERFACE_FREQ, then the single clock mode is enabled. In single clock mode, only the lookup interface clock is used for lookup interfaces, RAM and match logic.
Table 2. LPM Mode Configuration Parameters
Parameter Name Valid Range Description
KEY_WIDTH 10–256 bits

The width of the lookup key.

KEY_WIDTH + 8*RESPONSE_WIDTH + CTRL cannot exceed 512

CTRL= 32 to 63

RESPONSE_WIDTH 2–32 bits

The width of the lookup response.

KEY_WIDTH + 8*RESPONSE_WIDTH + CTRL cannot exceed 512

CTRL= 32 to 63

FORMAT_STRING NA For more information, see section Format String in Designing with the Core.
NUM_ENTRIES 1–1M The supported number of guaranteed keys/prefixes (depth). In LPM mode, keys are compressed. This means that actual number of keys that fit in the CAM is between NUM_ENTRIES and 8 x NUM_ENTRIES. For large IPv4 BGP routing tables (RouteViews), 2.5 x NUM_ENTRIES keys fit.
MEMORY_PRIMITIVE BLOCK or ULTRA or AUTO The compiler selects the best suited type automatically. This can however be overridden as a user preference.
LOOKUP_RATE 1–600 Mlps

This is the supported lookup rate of the instance (expressed in million lookups per second). To save resources it is important not to set the lookup rate higher than required.

LOOKUP_INTERFACE_FREQ 1–600 MHz

This is the clock frequency of the Lookup request and response interfaces.

LOOKUP_INTERFACE_FREQ >= LOOKUP_RATE

RAM_FREQ 1–600 MHz This is the clock frequency of the memories and the internal datapath. An optional, high frequency RAM clock enables time division of the hardware resources, leading to significant savings. See the TDM_FACTOR parameter.
TDM_FACTOR 1–256

The TDM_FACTOR is calculated as:

RAM_FREQ / LOOKUP_RATE

Example: RAM clock frequency = 360, Lookup rate = 150 → TDM_FACTOR = 360 /150 = 2.4

The RAM can be accessed two times per lookup, saving upto two times the RAM and logic resources for small table configurations.

There is also an additional 360 / (2 x 150) = 20% spare bandwidth available.

CLOCKING_MODE SINGLE_CLOCK or DUAL_CLOCK The use of a separate RAM clock is optional. If RAM_FREQ = LOOKUP_INTERFACE_FREQ, then the single clock mode is enabled. In single clock mode, only the lookup interface clock is used for lookup interfaces,RAM and match logic.

All these parameters are extracted from the P4 code and VitisNetP4 tool during compilation. If STCAM is used without P4, these parameters need to be set prior to generating the hardware STCAM or calling the software API. VitisNetP4 ensures that the parameters used to generate the hardware STCAM and those used to create the software STCAM instance are synchronized. For standalone usage, you must guarantee that the parameters used to generate the hardware STCAM and the parameters used to call the software API are identical.