Allow Register Replication - 2024.2 English - 2024.1 English

UltraFast Design Methodology Guide for FPGAs and SoCs (UG949)

Document ID
UG949
Release Date
2024-11-13
Version
2024.2 English

Synthesis and physical optimization tools can replicate registers to reduce high fanout nets on critical paths. Alternatively, you can apply attributes on specific registers or levels of hierarchy to specify which registers can or cannot be replicated. For example, the presence of a LUT1 on a replicated net indicates that an attribute or constraint is partly preventing the optimization. During synthesis, a KEEP_HIERARCHY attribute on a hierarchical cell traversed by the optimized net or a KEEP attribute on net segment in a different hierarchy can alter the replication optimizations. During synthesis and implementation, a DONT_TOUCH constraint also prevents beneficial replications.

Sometimes, you address the high fanout nets in RTL or synthesis by using a MAX_FANOUT attribute on a specific net. This does not always result in the most optimal routing resource usage, especially if the MAX_FANOUT attribute is set too low or is set on a net connected to several major hierarchies. In addition, if the high fanout signal is a register control signal and is replicated more than necessary, this can lead to a higher number of control sets and increase design power by unnecessarily adding additional registers that may not be necessary for timing closure.

Often, a better approach to reducing fanout is to use a balanced tree for the high fanout signals. Consider manually replicating registers based on the design hierarchy, because the cells included in a hierarchy are often placed together.

To restructure and reduce the number of control set trees and high fanout nets, you can use the opt_design Tcl command with one of the following options:

-control_set_merge
This option aggressively combines the drivers of logically-equivalent control signals to a single driver.
-merge_equivalent_drivers
This option merges logically-equivalent signals, including control signals, to a single driver.
Note: Try the -merge_equivalent_drivers option first, because the tools are aware of major hierarchies and Pblock constraints when you run this option.

These options are the reverse of fanout replication and result in nets that are better suited for module-based replication. This merge also works across multi-stage reset trees as shown in the following figure.

Figure 1. Control Set Merging Using opt_design -control_set_merge

After reducing the number of replicated objects, you can use the opt_design Tcl command to perform limited replication based on the hierarchy characteristics, with the following option:

-hier_fanout_limit <arg>
This option replicates registers according to the hierarchy where <arg> represents the fanout limit for the replication according to the logical hierarchy. For each hierarchical instance driven by the high fanout net, if the fanout within the hierarchy is greater than the specified limit, the net within the hierarchy is driven by a replica of the driver of the high fanout net. The replicated driver is placed in the same level of hierarchy as the original driver, and replication is not limited to control set registers.

The following figure shows replication on a clock enable net with a fanout of 60000 using opt_design -hier_fanout_limit 1000. Because each module SR_1K contains 1000 loads, the driver is replicated 59 times.

Figure 2. Module-Based Replication on a High-Fanout Clock Enable Net

Fanout optimization is enabled by default in place_design. Replication occurs early in the placer flow and is based on placement information. Registers that drive more than 1000 loads and registers that drive DSPs, block RAMs, and UltraRAMs are considered for replication and are co-located with the loads if replication occurs. You can force the replication of a register or a LUT driving a net by adding the FORCE_MAX_FANOUT property to the net. The value of the FORCE_MAX_FANOUT specifies the maximum physical fanout the nets should have after the replication optimization.

You can force replication based on physical device attributes with the MAX_FANOUT_MODE property. Supported MAX_FANOUT_MODE properties are CLOCK_REGION, SLR, MACRO. For example, the MAX_FANOUT_MODE property with a value of CLOCK_REGION replicates the driver based on the physical clock region, the loads placed into same clock region will be clustered together. For more information, see this link in the Vivado Design Suite User Guide: Implementation (UG904).

For SSI technology devices, high-fanout drivers can be replicated for each SLR and optionally assigned to SLR-aligned Pblocks along with their loads. This technique helps reduce the impact of the SLR crossing delay and gives more freedom to place the replicated high fanout nets independently in each SLR.