Optimizing Paths with Dedicated Blocks and Macro Primitives

Optimizing Paths with Dedicated Blocks and Macro Primitives - 2023.1 English

UltraFast Design Methodology Guide for FPGAs and SoCs (UG949)

Document ID

UG949

Release Date

2023-06-07

Version

2023.1 English

Paths from/to/between dedicated blocks and macro primitives (e.g., DSP, block RAM, or UltraRAM) need special attention because these primitives usually have the following timing characteristics:

Higher setup/hold/clock-to-output timing arc values for some pins. For example, a block RAM has a clock-to-output delay around 1.5 ns without the optional output register and 0.4 ns with the optional output register. Review the data sheet of your target device series for complete details.
Higher routing delays than regular FD/LUT connections.
Higher clock skew variation than regular FD-FD paths.

Also, their availability and site locations are restricted compared to CLB slices, which usually makes their placement more challenging and often incurs some QoR penalty.

For these reasons, AMD recommends the following:

Pipeline paths from and to dedicated blocks and macro primitives as much as possible.
Restructure the combinational logic connected to these cells to reduce the logic levels by at least 1 or 2 cells if latency incurred by pipelining is a concern.
Meet setup timing by at least 500 ps on these paths before placement.
Replicate cones of logic connected to too many dedicated blocks or macro primitives if they need to be placed far apart.
When the design has tight timing requirements to, within, or from a DSP block, run opt_design -dsp_register_opt to move registers to a more timing optimal position.
Note: Because timing is approximate during opt_design, you might also need to run phys_opt_design -dsp_register_opt to correct movements where timing was not accurately represented at the pre-placement stage.