Overview - 1.0 English

Cached DRAM Binary CAM LogiCORE IP Product Guide (PG427)

Document ID
PG427
Release Date
2023-10-18
Version
1.0 English

The Cached DRAM Binary CAM (CDBCAM) core is related to the on-chip BCAMs provided by AMD, see Binary CAM Search LogiCORE IP Product Guide (PG317). The main difference is that CDBCAM uses DRAM as a primary storage for entries, whereas on-chip BCAM uses FPGA internal SRAM. CDBCAM can store orders of magnitude more entries, and, in combination with BCAM cache implemented as an on-chip BCAM, can achieve lookup rates comparable to on-chip BCAM.

Only AMD Versal™ architecture is supported. Two types of DRAM are supported: HBM and DDR4.

The CDBCAM stores {key, response} entries in DRAM. The Lookup interface of the CDBCAM receives a lookup key and outputs a result that indicates if the lookup key matches the key of any entry in the CDBCAM. If any CDBCAM entry is matched, the response value of the matching entry is output. Otherwise, the default software programmable response value is output.

The entries are read and written using a set of high-level API functions. The API functions are written in C and delivered as part of the IP. The API encapsulates the details of memory management and register access and provides a simple and efficient management interface. The API software with detailed documentation is found in the CAM IP Product Page. You provide the functions for basic hardware reads and writes to the API. This allows for flexible hardware mapping and the communications link between the API software and the hardware is designed to the user specifications. For instance, the communication link could be AXI4-Lite or PCIe® .

A CDBCAM design is highly configurable at compile time, which makes it suitable for a large variety of applications. The following table lists the configuration parameters.

Table 1. Configuration Parameters
Parameter Name Values Description
CLOCKING_MODE SINGLE_CLOCK / DUAL_CLOCK The use of a separate memory clock is optional. In SINGLE_CLOCK mode all logic (except a small amount of control logic on the AXI4-Lite domain) is clocked on the lookup interface clock.
LOOKUP_INTERFACE_FREQ 15-400 MHz Frequency of the lookup request interface. In SINGLE_CLOCK mode the maximum frequency supported is 250 MHz. In DUAL_CLOCK mode the maximum frequency supported is 400 MHz.
MEMORY_INTERFACE_FREQ 15-250 MHz Frequency of the memory interface in DUAL_CLOCK mode.
DRAM_TYPE DDR4, HBM DRAM type. The selection also determines the NoC NMU width, which is 32B for HBM, and 64B for DDR4.
NUM_ENTRIES 8K-60M Number of entries that can be stored in CDBCAM IP.
KEY_WIDTH 10-992

Key width in bits. The width of the lookup key. KEY_WIDTH +

RESPONSE_WIDTH + 1 cannot exceed 1016.

RESPONSE_WIDTH 1-1006 Response width in bits. The width of the lookup response.

KEY_WIDTH + RESPONSE_WIDTH + 1 cannot exceed 1016.

If the KEY_WIDTH + RESPONSE_WIDTH < 512, then each entry will take 64B of DRAM, otherwise it will take 128B.

DEFAULT_RESPONSE_VALUE Any value of RESPONSE_WIDTH bits User defined default response in case of no match.
MEMORY_PRIMITIVE ULTRA, BLOCK Memory type for storing cache entries.
CACHE_SUPPORT QUARTER, HALF, FULL, NONE

Cache support can be disabled, or, when enabled, use different memory types and maximum lookup rate as follows.

  • Full Streaming rate with 16K (Ultra) / 2K (Block) entries 1 .
  • Half Streaming rate with 8K (Ultra) / 1K (Block) entries.
  • Quarter Streaming rate with 4K (Ultra) / 512 (Block) entries.
UPDATE_MODE SOFTWARE, HARDWARE Table management when performed by software requires a shadow memory and performance depends on the processor. On the contrary in hardware mode no additional memory is required, and performance is independent of the processor, but needs extra hardware resources. However, insert, delete, and update operations are triggered by software in hardware mode.
NUM_PCS 1,2,4,8 Determined by entry size and the number of entries. The value should be set in the GUI, but it can be overwritten when more HBM PCs are required than is determined by the GUI. For instance, if all the entries can fit into two HBM PCs, but you want to use four PCs for higher BW, you can set NUM_PCS to 4.
  1. Full streaming rate is equal to Lookup Interface Frequency and represents the maximum possible lookup rate.

There are two options for configuration parameters:

  1. All of these parameters are extracted or calculated from the P4 code during compilation. Vitis Networking P4 ensures that the parameters used to generate the hardware CDBCAM and those used to create the software CDBCAM are synchronized.
  2. For standalone usage, you must guarantee that the parameters used to generate the hardware CDBCAM and the parameters used to call the software CDBCAM are identical.

Only one DDR4 channel is supported. For HBM, it is possible to use multiple HBM PCs, see Example Design for more details.