CAM Algorithms are Improving - CAM Algorithms are Improving - WP570

Advances in High-Capacity Algorithmic CAMs on AMD Versal Devices (WP570)

Document ID
WP570
Release Date
2026-05-01
Revision
1.0 English

An algorithmic CAM stores key-value pairs at specific memory addresses so searches retrieve them with a small number of memory reads.

Figure 1. Evolution of the AMD CAM IP Portfolio

The AMD CAM intellectual property portfolio, first introduced in 2019, uses the N-way cuckoo hashing methodology. This approach enables exact match applications to perform searches within N memory reads. The decision to employ N = 4 helps achieve very-high load factors. Initial four-way cuckoo hashing CAMs used embedded FPGA SRAMs, including block RAM and UltraRAM, and supported read rates up to 600 million per second. With four-way parallel reads on block RAMs or UltraRAMs, peak exact-match search rates of 600 MS/s were attainable using AMD CAM IP.

Additionally, the AMD CAM IP supports partial match searches by introducing a bit mask, matching the key size that designates the bits required for comparison. This implementation represents partial match CAMs as an aggregation of exact match CAMs, each assigned a distinct mask. When using M number of masks, the system executes M number of exact match searches, each potentially requiring up to four reads. Despite the higher read count—up to four times M reads per partial match search—the inherent parallelism of the FPGA memory architecture sustains high throughput, allowing partial-match search rates of up to 600 MS/s, although typically at a greater cost than exact matches.

As CAM data set sizes increase, the capacity limitations and expense associated with block RAM and UltraRAM becomes apparent. Therefore, in 2023, AMD expanded its CAM IP to enable exact match searches in DDR4 SDRAM on Versal platforms.

DRAM presents inherent challenges due to reduced read bandwidth and limited parallelism. The initial CAM IP release for DRAM employed the four-way cuckoo hashing algorithm. This design achieves search rates of approximately 30 MS/s on a single 72-bit DDR4-2400 channel with 64-byte entry sizes. To address any limitations, the design introduces an optional cache for popular entries. When cache hit rates are sufficiently high, search performance rivals that of SRAM-based CAMs. Conversely, random key searches with ineffective caching remain constrained to 30 MS/s.

In 2024, AMD added support for a fast cuckoo-hashing method within its IP. This advancement decreases the average number of reads per search from four to nearly one without caching. It enables exact match search rates approaching 100 MS/s on a 72-bit DDR4-2400 interface. Further development has enabled full utilization of LPDDR5 benefits. The following table provides a comparative summary of achievable exact-match search rates across various algorithms and memory channels.

Table 1. Exact Match Search Rates (64-Byte Entry Size, No Cache)
Memory Channel Fast Cuckoo-Hashing 4-way Cuckoo-Hashing
72-bit DDR4-2400 92 MS/s 30 MS/s
16-bit LPDDR5-6400 (inline ECC) 77 MS/s 25 MS/s

In 2025, the CAM IP portfolio included an option that performs high-speed longest prefix (LPM) searches on million-entry tables in DRAM. To support the complete spectrum of 128 IPv6 prefixes, the implementation required the use of 128 distinct masks. Employing pure 4-way cuckoo-hashing necessitates up to 512 (4 x 128) reads per search, leading to suboptimal throughput.

The AMD solution integrated multiple techniques that collectively lowered reads per search to under 1.1 on average for typical LPM tables. This advancement allows LPM search rates to meet or exceed those achieved by exact match searches. Notably, a single 16-bit LPDDR5 channel provides sufficient bandwidth to sustain rates of 90 MS/s.