During physical synthesis in the placer (PSIP), the placer can perform various
physical optimizations that will optimize the netlist for later placement phases based
on the initial placement of the design after the floorplanning stage. For example, for
fanout based replication the replicated driver can be co-located with its loads because
the initial placement is known. This alleviates congestion that can be introduced when
replication is done without knowledge of placement prior to place_design
. Optimizations are considered based on internal parameters
and for timing based optimizations the timing is evaluated and the optimization is
committed if timing is improved. The following optimizations are available as shown in
the following figure.
- SLR Replication
-
High fanout nets that connect a single driver to multiple loads, can present timing challenges when these loads are distributed across different SLRs. SLR replication replicates the driver of high fanout nets which has critical loads in other SLRs. Drivers are selected based on timing estimates after Floorplanning phase of placer. In this optimization all soft and hard Pblock constraints are honored.
- Control Set Optimization
- Performs control set reduction with more accurate placement location information. With meaningful initial placement result, the flops are distributed among the placement area, and flop minimal resource usage for the exactly flops in a small region is calculated. For the hotspot regions, it finds an optimal solution to reduce the resource usage. Therefore, reduce the legalization effort to push cells further in downstream and leave more room for other optimizations. This phase is activated during placer's Explore directive only.
- Auto Pipeline Insertion
- Auto Pipeline Insertion is a non-timing driven optimization that inserts pipeline registers on user marked nets. This feature is used to address timing closure challenges on specific buses and interfaces. Once pipeline stages have been inserted, the placer adjusts pipeline locations to improve clock speed. See Auto-Pipelining for more information.
- Property-Based Retiming
- Property-based retiming provides user-controlled retiming through setting a
property on a register or LUT. This optimization is ideal for critical paths
with sufficient margin on timing startpoints or endpoints. Two properties
control retiming in PSIP. PSIP_RETIMING_BACKWARD with value of TRUE performs
backward retiming and PSIP_RETIMING_FORWARD with value of TRUE performs forward
retiming. The properties can be applied to a register or LUT. When
PSIP_RETIMING_FORWARD with value TRUE is applied to a register, PSIP forward-
retimes over all LUT loads driven by the Q pin of the register. When
PSIP_RETIMING_FORWARD with value TRUE is applied to a LUT primitive, the
register driving the LUT is moved to the output of the LUT. When
PSIP_RETIMING_BACKWARD with value TRUE is applied to a register, PSIP backward
retimes over the LUT driving the D pin of the register. Note that the backward
retiming property on the register does not trigger backward retiming over
control set pin driver LUTs. When PSIP_RETIMING_BACKWARD with value TRUE is
applied to a LUT primitive, the register driven by the LUT will be moved to LUT
inputs. Multi-level retiming is supported by applying the property to all LUT
primitives along the path. All retimed cells will have the PHYS_OPT_MODIFIED
property set to RETIMING.
Retiming does not work for the following:
- Moving logic across macro, such as BRAM, UltraRAM, and DSPs
- Register packed in I/O sites
- Paths with different startpoint/endpoint clocks
- Paths with timing exceptions
- Paths with properties that prevent optimization, such as DONT_TOUCH, ASYNC_REG, and so forth.
- Very High-Fanout Optimization
- Very High-Fanout Optimization replicates registers driving high-fanout nets (fanout > 1000, slack < 2.0 ns).
- Critical Cell Optimization
- Critical-Cell Optimization replicates cells in failing paths. If the loads on a specific cell are placed far apart, the cell might be replicated with new drivers placed closer to load clusters. This optimizations often applies to nets driving large block RAM or URAM arrays or large number of DSPs as the sites for these blocks are spread over a wider area of the device. High fanout is not a requirement for this optimization to occur (slack < 0.5 ns).
- Fanout Optimization
- Nets with a MAX_FANOUT property value that is less than the
actual fanout of the net are considered for fanout optimization. The user can
force the replication of a register or a LUT driving a net by adding the
FORCE_MAX_FANOUT property to the net. The value of the FORCE_MAX_FANOUT
specifies the maximum physical fanout the nets should have after the replication
optimization. The physical fanout in this case refers to the actual site pin
loads, not the logical loads. For example if the replica drives multiple LUTRAM
loads that are all grouped in the same slice, the combined fanout will be 1 for
all of the LUTRAMs in the same slice. The FORCE_MAX_FANOUT forces the
replication during physical synthesis regardless of the slack of the signal. The
user can force replication based on physical device attributes with the
MAX_FANOUT_MODE property. The property can take on the value of CLOCK_REGION,
SLR, or MACRO. For example, the MAX_FANOUT_MODE property with a value of
CLOCK_REGION replicates the driver based on the physical clock region, the loads
placed into same clock region will be clustered together. The MAX_FANOUT_MODE
property takes precedence over the FORCE_MAX_FANOUT property and physical
synthesis will try to honor both by applying MAX_FANOUT_MODE based optimization
first and then all its replicated drivers will inherit the FORCE_MAX_FANOUT
property to do further replication within a clock region. This is illustrated in
the following figure example where a register drives four loads; two registers
and two MACRO loads (Block RAM, UltraRAM or DSP). Replication provides separate
drivers for the register loads and MACRO loads and then the driver for the MACRO
loads is replicated until the FORCE_MAX_FANOUT property value is satisfied.Figure 2. Applying MAX_FANOUT_MODE with value MACRO together with FORCE_MAX_FANOUTNote: This optimization happens early in the placer. In the later stages of the placer as the timing accuracy improves, both the replicated source and/or load registers may be moved to different clock regions or SLRs if the timing estimate improves.
- DSP Register Optimization
- DSP Register Optimization can move registers out of the DSP cell into the logic array or from logic to DSP cells if it improves the delay on the critical path.
- Shift Register to Pipeline Optimization
- Shift Register to Pipeline Optimization turns a shift register
with fixed length greater than 1 to a dynamically adjusted register pipeline. It
pulls more registers when the distance to cover is longer. The registers are
placed optimally to balance timing paths. The latency is unaffected.
Only SRLs with the PHYS_SRL2PIPELINE attribute set to TRUE are considered for this optimization. The property must be set on the SRL cell and should be set on all bits of a bus to have all the bits optimized. The pull of FFs happens on the SRL's Q-pin.
- Block RAM Register Optimization
- Block RAM Register Optimization can move registers out of the block RAM cell into the logic array or from logic to block RAM cells if it improves the delay on the critical path.
- URAM Register Optimization
- UltraRAM Register Optimization can move registers out of the UltraRAM cell into the logic array or from logic to UltraRAM cells if it improves the delay on the critical path.
- Dynamic/Static Region Interface Net Replication
- Optimization to replicate drivers on static design to reconfigurable module boundary paths in DFX flow.
- Equivalent Driver Rewire Optimization
- This optimization redistributes loads between logically-equivalent drivers to minimize routing overlap and provide a more optimal co-location of drivers and loads. This helps reduce utilization and congestion and allows later placer stages to move drivers and loads more optimally to improve QoR.
- Physical Optimization of SLR crossings
- The physical optimization of SLR crossing pulls out registers
from the SRL if there is an SLR crossing. It guides the tool to have an optimal
FF->FF SLR crossing to increase the performance on the SLR crossing by using
the USER_SLL_REG=TRUE property. If, after the optimization, the SRL depth
becomes one, it is converted into a regular register.
The optimization occurs when:
- Fanout of the SRL is 1
- Minimum clock frequency threshold is 250 MHz
- SRL needs to have a minimum depth of 2
- SRL is not LUT combined
- Source and destination cells are in neighboring SLRs
For more information on these optimizations see Available Physical Optimizations in the Physical Optimization section. Physical synthesis in the placer is run by default in all of the placer directives. At the end of the physical synthesis phase, a table shows the summary of optimizations.
When an optimization is performed on a primitive cell, the PHYS_OPT_MODIFIED property of the cell is updated to reflect the optimizations performed on the cell. When multiple optimizations are performed on the same cell, the PHYS_OPT_MODIFIED value contains a list of optimizations in the order they occurred. The following table lists the PHYS_OPT_MODIFIED and PHYS_OPT_SKIPPED values that correspond with a Physical Synthesis optimization.
Optimization | PHYS_OPT_MODIFIED and PHYS_OPT_SKIPPED Value |
---|---|
Autopipeline Insertion | AUTOPIPELINE |
Block RAM Register Optimization | BRAM_REGISTER_OPT |
Control Set Optimization | CONTROL_SET_OPT |
Critical Cell optimization | CRITICAL_CELL_OPT |
DSP Register Optimization | DSP_REGISTER_OPT |
Equivalent Driver Rewire Optimization | EQU_REWIRE_OPT |
Fanout Optimization | FANOUT_OPT |
Property Based Retiming | RETIMING |
Shift Register Optimization | SHIFT_REGISTER_OPT |
Shift Register to Pipeline Optimization | SHIFT_REGISTER_TO_PIPELINE |
URAM Register Optimization | URAM_REGISTER_OPT |
Very High Fanout Optimization | FANOUT_OPT |