As you code your design, be aware of the logic being inferred. Monitor the
following conditions for additional pipelining considerations:
Cones of logic with large fanin
For example, code
that requires large buses or several combinational signals to compute an
output.
Blocks with restricted placement or slow clock-to-out or large setup
requirements
For example, block RAMs without output registers or
arithmetic code that is not appropriately pipelined.
Forced placement that causes long routes
For
example, a pinout that forces a route across the chip might require pipelining
to allow for high-speed operation.
Logic comprised of large XOR functions
Large XOR
functions often have high switch rates that can generate large dynamic power
dissipation. Pipelining these functions can reduce switching, which positively
impacts power consumption of the described circuit.
In the following figure the clock speed is limited by the following:
Clock-to out-time of the source flip-flop
Logic delay through four levels of logic
Routing associated with the four function generators
Setup time of the destination register
Figure 1. Before Pipelining
Use one of the following methods to ensure that your design uses pipeline
registers correctly:
In your RTL code, add the registers before or after the logic to be
retimed, preferably within the hierarchy.
Use the Vivado synthesis global
retiming or BLOCK_SYNTH.RETIMING option, which analyzes the timing of a path and
moves the registers to improve timing, if possible.
Alternatively, for more control, use the retiming_forward and retiming_backward synthesis attributes. You can add these attributes
on specific registers to force the tool to retime through combinational logic
regardless of the timing score of the logic. For more information on these
attributes, see the
Vivado
Design Suite User Guide: Synthesis (UG901).
The following figure shows the pipelining after adding extra
registers.
Figure 2. Pipelining After Adding Extra Registers
The following figure is an example of the same data path shown in the
Before Pipelining diagram. Because the flip-flop is contained in the same slice as the
function generator, the clock speed is limited by the clock-to-out time of the source
flip-flop, the logic delay through one level of logic, one routing delay, and the setup
time of the destination register. In this example, the system clock runs faster after
pipelining and retiming than in the original design.
Figure 3. Pipelining After Retiming
Following is a code example that shows how to use the retiming attributes
to force the specific retiming shown in the Pipelining After Retiming figure.