Decomposing Deep Memory Configurations for Balanced Power and Performance - 2021.2 English

Vivado Design Suite User Guide: Design Analysis and Closure Techniques (UG906)

Document ID
UG906
Release Date
2021-10-27
Version
2021.2 English

In deep memory configurations, the synthesis attribute RAM_DECOMP can be used for better memory decomposition and reduced power consumption. This attribute can be set in the RTL. When the RAM_DECOMP attribute is applied to a memory, the memory is setup in a wider configuration (of primitives) instead of a deep and narrow configuration.

When the CASCADE_HEIGHT attribute is used along with the RAM_DECOMP attribute, the synthesis inference has more granular control on cascading thereby providing balanced power and performance. This approach requires additional address decoding logic but reduces the number of block RAMs accessed at any given point in time, which helps reduce power consumption. The memory configuration (32 × 16K) in the following figure shows an example of how the memory is decomposed when the RAM_DECOMP and CASCADE_HEIGHT attributes are set.

Figure 1. 32 × 16K Memory Configuration

If the attributes RAM_DECOMP = power and CASCADE_HEIGHT = 4 are applied, 16 RAMB36E2 are inferred and the memory is decomposed as shown in the following figure.

Figure 2. Generated Structure for 32 × 16K Memory Configuration using RAM_DECOMP and CASCADE_HEIGHT Attributes

The base primitive used here is 32 × 1K and four block RAMs are cascaded with a built-in feature to form a 32 × 4K configuration. Four such parallel structures create a 16K deep memory. The outputs are multiplexed to generate the output data.

Figure 3. RTL Code Snippet for 32 × 16K Memory Configuration using RAM_DECOMP and CASCADE_HEIGHT Attributes

If only the RAM_DECOMP = power attribute is applied, 16 RAMB36E2 are inferred and the memory is decomposed as shown in the following figure.

Figure 4. Generated Structure for 32 × 16K Memory Configuration using RAM_DECOMP Attribute

The base primitive used here is 32 × 1K and eight block RAMs are cascaded with a built-in feature to form a 32 × 8K configuration. Two such parallel structures create a 16K deep memory. The outputs are multiplexed to generate the output data. The multiplexer is a 2:1 MUX.

Figure 5. RTL Code Snippet for 32 × 16K Memory Configuration using RAM_DECOMP Attribute

The overall power savings are similar for both the memory decomposition examples, shown in Figure 2 and Figure 4, because only one block RAM is active at any given point in time. However, in terms of performance, a four-level deep cascaded block RAM chain (Figure 2) provides better performance than an eight-level deep cascaded block RAM chain (Figure 4).