Paths from/to/between dedicated blocks and macro primitives (e.g., DSP, block RAM, or UltraRAM) need special attention because these primitives usually have the following timing characteristics:
- Higher setup/hold/clock-to-output timing arc values for some pins. For example, a block RAM has a clock-to-output delay around 1.5 ns without the optional output register and 0.4 ns with the optional output register. Review the data sheet of your target device series for complete details.
- Higher routing delays than regular FD/LUT connections.
- Higher clock skew variation than regular FD-FD paths.
Also, their availability and site locations are restricted compared to CLB slices, which usually makes their placement more challenging and often incurs some QoR penalty.
For these reasons, AMD recommends the following:
- Pipeline paths from and to dedicated blocks and macro primitives as much as possible.
- Restructure the combinational logic connected to these cells to reduce the logic levels by at least 1 or 2 cells if latency incurred by pipelining is a concern.
- Meet setup timing by at least 500 ps on these paths before placement.
- Replicate cones of logic connected to too many dedicated blocks or macro primitives if they need to be placed far apart.
- When the design has tight timing requirements to, within, or from a
DSP block, run
opt_design -dsp_register_opt
to move registers to a more timing optimal position.Note: Because timing is approximate duringopt_design
, you might also need to runphys_opt_design -dsp_register_opt
to correct movements where timing was not accurately represented at the pre-placement stage.