AMD enhancements to the 4-way cuckoo-hashing algorithm reduce an average search from 4 to 1.3 reads, even at high load factors. These improvements enable exact match rates of up to 90 MS/s on a 72-bit DDR4-2400 channel. The same approach reaches up to 150 MS/s on a 2 x 16-bit LPDDR5-6400 channel. Higher search rates come with trade-offs:
- Moderate increase in search latency
- Moderate rise in LUT and block RAM usage
- Key and value size reduced by 16 bits
The following table shows three examples of implementing a CAM with a 300-bit key, 120 bit value, and a target rate of 50 MS/s:
- A low-capacity 16 K entries SRAM-based CAM.
- Two high-capacity 4 M entry CAMs using 2 x 16-bit LPDDR5-6400 evaluate the standard and fast cuckoo hash methods.
| Method | Memory | DRAM Bandwidth Used (Mreads/s) | DRAM Bandwidth Used | Search Latency (ns) | LUTs | BRAM36 | URAMs |
|---|---|---|---|---|---|---|---|
| Cuckoo | SRAM | 0 | N/A | 100 | 4000 | 0 | 4 |
| Cuckoo | LPDDR5 | 200 | 100% | 600 | 8000 | 2 | 0 |
| Fast cuckoo | LPDDR5 | 66 | 33% | 800 | 10,000 | 8 | 0 |
The high-capacity CAMs incur extra costs in DRAM bandwidth, latency, LUTs, and block RAM. The fast cuckoo method uses only 33% of available DRAM bandwidth, allowing DRAM channels to serve other functions beyond CAM operations. However, this bandwidth saving comes at the expense of increased latency, LUTs, and block RAM usage.