Performance - 1.2 English - PG427

Cached DRAM Binary CAM LogiCORE IP Product Guide (PG427)

Document ID
PG427
Release Date
2024-11-27
Version
1.2 English

Lookup Performance

High capacity CDBCAM systems consists of a high capacity DBCAM and a lower capacity BCAM Cache Subsystem (BCS). The DBCAM uses DRAM for storage and has a higher latency and has a generally lower lookup rate than the on-chip BCAM. The BCS provides lower latency and a high lookup rate on-chip cache BCAM. The resulting lookup rate of such a system is inversely proportional to the Cache Miss Rate (CMR), but it cannot go beyond the maximum lookup rate of the BCS, hence the resulting lookup rate can be expressed by the formula:

  • LR = CLR, for CLR*CMR < DLR.
    Note: To achieve this rate, the AXI4-Stream rate must be higher than LR.
  • LR = DLR/CMR, for CLR*CMR > DLR.

where:

  • LR is the resulting lookup rate;
  • CLR is the cache (BCS) lookup rate;
  • DLR is the DBCAM lookup rate;
  • CMR is the cache (BCS) miss rate. For example, for 1 in 10 lookups being a miss, the CMR would be 0.1.

The expected performance is the function of three variables, where one of them is lookup flow content dependent. To visualize the expected performance, the graph below illustrates a typical case with the following characteristics as a function of the cache (BCS) miss rate:

  • CLR = 300 M lookups/s
  • DLR = 115 M lookups/s
Figure 1. Achievable Lookup Throughput Performance Generated by Your Tool

DBCAM (CDBCAM without cache) Lookup Throughput Performance

Because the resulting lookup performance depends on DBCAM lookup performance, below are some examples of achievable DBCAM performance for 50% table fill level.

Table 1. DBCAM Lookup Throughput Performance and Latency
DRAM Type DBCAM Entry Size 1 [B] Mean DLR [Mlookups/s] Mean Lookup Latency [μs]
DDR4 64 115 0.73
DDR4 128 78 0.91
HBM 64 109 0.98
HBM 128 83 1.2
  1. For DBCAM, the entry size is calculated as follows:
    • (KEY_WIDTH + RESPONSE_WIDTH) < 496b: 64B
    • ((KEY_WIDTH + RESPONSE_WIDTH) >= 496b: 128B
  2. The DDR4 results reflect DDR4-3200AC component clocking at 3200 MHz. Performance will vary with RDIMM part.
  3. The HBM results reflect HBM clocking at 2400 MHz. Higher rates are possible at 3200 MHz clock frequency.
  4. Higher lookup rates can be achieved when using more than one HBM pseudo-channel. Lookup performance is capped at around half of the mem_clk frequency.