Most tools can replicate registers to reduce high fanout nets on critical paths. Alternatively, you can apply attributes on specific registers or levels of hierarchy to specify which registers can or cannot be replicated. For example, the presence of a LUT1 on a replicated net indicates that an attribute or constraint is partly preventing the optimization. During synthesis, a KEEP_HIERARCHY attribute on a hierarchical cell traversed by the optimized net or a KEEP attribute on net segment in a different hierarchy can alter the replication optimizations. During synthesis and implementation, a DONT_TOUCH constraint also prevents beneficial replications.
Sometimes, designers address the high fanout nets in RTL or synthesis by using a MAX_FANOUT attribute on a specific net. This does not always result in the most optimal routing resource usage, especially if the MAX_FANOUT attribute is set too low or is set on a net connected to several major hierarchies. In addition, if the high fanout signal is a register control signal and is replicated more than necessary, this can lead to a higher number of control sets and increase design power by unnecessarily adding additional registers that may not be necessary for timing closure
Often, a better approach to reducing fanout is to use a balanced tree for the high fanout signals. Consider manually replicating registers based on the design hierarchy, because the cells included in a hierarchy are often placed together.
To restructure and reduce the number of control set trees and high fanout
nets, you can use the opt_design
Tcl command with one
of the following options:
-
-control_set_merge
- This option aggressively combines the drivers of logically-equivalent control signals to a single driver.
-
-merge_equivalent_drivers
- This option merges logically-equivalent signals, including control signals, to a single driver.
These options are the reverse of fanout replication and result in nets that are better suited for module-based replication. This merge also works across multi-stage reset trees as shown in the following figure.
After reducing the number of replicated objects, you can use the opt_design
Tcl command to perform limited replication
based on the hierarchy characteristics, with the following option:
-
-hier_fanout_limit <arg>
- This option replicates registers according to the hierarchy
where
<arg>
represents the fanout limit for the replication according to the logical hierarchy. For each hierarchical instance driven by the high fanout net, if the fanout within the hierarchy is greater than the specified limit, the net within the hierarchy is driven by a replica of the driver of the high fanout net. The replicated driver is placed in the same level of hierarchy as the original driver, and replication is not limited to control set registers.
The following figure shows replication on a clock enable net with a
fanout of 60000 using opt_design -hier_fanout_limit
1000
. Because each module SR_1K contains 1000 loads, the driver is
replicated 59 times.
Fanout optimization is enabled by default in place_design
. Replication occurs early in the placer flow and is based on
placement information. Registers that drive more than 1000 loads and registers that
drive DSPs, block RAMs, and UltraRAMs are considered for replication and are co-located
with the loads if replication occurs. You can force the replication of a register or a
LUT driving a net by adding the FORCE_MAX_FANOUT property to the net. The value of the
FORCE_MAX_FANOUT specifies the maximum physical fanout the nets should have after the
replication optimization.
You can force replication based on physical device attributes with the MAX_FANOUT_MODE property. Supported MAX_FANOUT_MODE properties are CLOCK_REGION, SLR, MACRO. For example, the MAX_FANOUT_MODE property with a value of CLOCK_REGION replicates the driver based on the physical clock region, the loads placed into same clock region will be clustered together. For more information, see this link in the Vivado Design Suite User Guide: Implementation (UG904).
For SSI technology devices, high-fanout drivers can be replicated for each SLR and optionally assigned to SLR-aligned Pblocks along with their loads. This technique helps reduce the impact of the SLR crossing delay and gives more freedom to place the replicated high fanout nets independently in each SLR.