Large Scale Exact Match Search—Performance and Costs - Large Scale Exact Match Search—Performance and Costs - WP570

Advances in High-Capacity Algorithmic CAMs on AMD Versal Devices (WP570)

Document ID
WP570
Release Date
2026-05-01
Revision
1.0 English

AMD enhancements to the 4-way cuckoo-hashing algorithm reduce an average search from 4 to 1.3 reads, even at high load factors. These improvements enable exact match rates of up to 90 MS/s on a 72-bit DDR4-2400 channel. The same approach reaches up to 150 MS/s on a 2 x 16-bit LPDDR5-6400 channel. Higher search rates come with trade-offs:

  • Moderate increase in search latency
  • Moderate rise in LUT and block RAM usage
  • Key and value size reduced by 16 bits

The following table shows three examples of implementing a CAM with a 300-bit key, 120 bit value, and a target rate of 50 MS/s:

  • A low-capacity 16 K entries SRAM-based CAM.
  • Two high-capacity 4 M entry CAMs using 2 x 16-bit LPDDR5-6400 evaluate the standard and fast cuckoo hash methods.
Table 1. The Demands of Various 50 MS/s Exact Match CAM Implementations
Method Memory DRAM Bandwidth Used (Mreads/s) DRAM Bandwidth Used Search Latency (ns) LUTs BRAM36 URAMs
Cuckoo SRAM 0 N/A 100 4000 0 4
Cuckoo LPDDR5 200 100% 600 8000 2 0
Fast cuckoo LPDDR5 66 33% 800 10,000 8 0

The high-capacity CAMs incur extra costs in DRAM bandwidth, latency, LUTs, and block RAM. The fast cuckoo method uses only 33% of available DRAM bandwidth, allowing DRAM channels to serve other functions beyond CAM operations. However, this bandwidth saving comes at the expense of increased latency, LUTs, and block RAM usage.