Use parallel clock buffers to achieve the following:
- Ensure predictable placement across implementation runs.
When the parallel clock buffers are directly driven by the same input clock port, MMCM, PLL, or GT*_CHANNEL, the buffers are always placed in the same clock region as their driver regardless of the netlist changes or logic placement variation.
- Match the insertion delays between parallel branches of the clock tree.
Xilinx recommends parallel buffers over cascaded clock buffers, especially when there are synchronous paths between the branches. When using cascaded buffers, the clock insertion delay is not matched between the branches of the clock trees even when using the CLOCK_DELAY_GROUP or USER_CLOCK_ROOT constraints. This can result in high clock skew, which makes timing closure challenging if not impossible.
The following figure shows three parallel BUFGCE buffers driven by the MMCM CLKOUT0 port.