Decomposing Deep Memory Configurations for Balanced Power and Performance - 2025.2 English - UG906

Vivado Design Suite User Guide: Design Analysis and Closure Techniques (UG906)

Document ID
UG906
Release Date
2025-12-10
Version
2025.2 English

In deep memory configurations, the synthesis attribute RAM_DECOMP can be applied in the RTL to improve memory decomposition and reduce power consumption. When applied, the memory is configured in a wider arrangement of primitives instead of a deep and narrow configuration.

When the CASCADE_HEIGHT attribute is used together with RAM_DECOMP, synthesis inference gains more granular control over cascading. This provides balanced power and performance. While this approach requires additional address decoding logic, it reduces the number of block RAMs accessed at a time, helping to lower power consumption.

Figure 1. 32 × 16K Memory Configuration

For example, applying RAM_DECOMP = power and CASCADE_HEIGHT = 4 infers 16 RAMB36E2 blocks and decomposes the memory as shown below.

Figure 2. Generated Structure for 32 × 16K Memory Configuration Using RAM_DECOMP and CASCADE_HEIGHT Attributes

The base primitive in this configuration is 32 × 1K. Four block RAMs are cascaded to form a 32 × 4K configuration. Four such parallel structures create a 16K-deep memory, with outputs multiplexed to generate the output data.

Figure 3. RTL Code Snippet for 32 × 16K Memory Configuration using RAM_DECOMP and CASCADE_HEIGHT Attributes

If only the RAM_DECOMP = power, 16 RAMB36E2 blocks are still inferred, but the decomposition changes as shown in the following figure.

Figure 4. Generated Structure for 32 × 16K Memory Configuration using RAM_DECOMP Attribute

In this case, the base primitive is 32 × 1K, with eight block RAMs cascaded to form a 32 × 8K configuration. Two such parallel structures create a 16K-deep memory, with outputs multiplexed through a 2:1 MUX.

Figure 5. RTL Code Snippet for 32 × 16K Memory Configuration using RAM_DECOMP Attribute

Power savings are similar in both configurations (Figure 2 and Figure 4), because only one block RAM is active at a time. However, performance differs: a four-level deep cascaded block RAM chain (Figure 2) provides better performance than an eight-level deep cascaded block RAM chain (Figure 4).