There are multiple ways of breaking a memory configuration to serve a particular requirement. The requirement for a particular design can be performance, power, or a mixture of both.
The following example highlights the different structures that can be generated to achieve your requirements. Synthesis can limit the cascading of the block RAM for the performance/power trade-off using the CASCADE_HEIGHT attribute. The usage and arguments for the attribute are described in the Vivado Design Suite User Guide: Synthesis (UG901).
The following figure shows an example of 8Kx32 memory configuration for higher performance (timing).
In this implementation, all block RAMs are always enabled (for each read or write) and consume more power.
The following figure shows an example of cascading all the block RAMs for low power.
In this implementation, because one block RAM at a time is selected (from each unit), the dynamic power contribution is almost half. Block RAMs have a dedicated cascade MUX and routing structure that allows the construction of wide, deep memories requiring more than one block RAM primitive to be built in a very power efficient configuration.
The following figure shows an example of how to limit the cascading and gain both power and performance at the same time, often with no trade-off in performance.
Because two block RAMs are selected at a time in this implementation, the dynamic power contribution is better than for the high performance structure, but not as good as for the low power structure. The advantage with this structure compared to a low power structure is that it uses only two block RAMs in the cascaded path, which has impact on the target frequency when compared to four block RAMs in the critical path for the low power structure.