Cascaded Clock Buffers - 2024.1 English

UltraFast Design Methodology Guide for FPGAs and SoCs (UG949)

Document ID
UG949
Release Date
2024-06-26
Version
2024.1 English

In general, AMD does not recommend using cascaded buffers to artificially increase the delay and reduce the skew between unrelated clock trees branches. Unlike connections between BUFGCTRLs, other clock buffer connections do not have a dedicated path in the architecture. Therefore, the relative placement of clock buffers is not predictable, and all placement rules take precedence over placing unconstrained cascaded buffers.

However, you can use cascaded clock buffers to achieve the following:

  • Route the clock to another clock buffer located in a different clock region.

    This method is typical when using a clock multiplexer for clocks generated by MMCMs located in different clock regions. Although one of the MMCMs can directly drive the BUFGCTRL (BUFGMUX), the other MMCM requires an intermediate clock buffer to route the clock signal to the other region. The following figure shows an example.

    Figure 1. Routing the Clock to Another Clock Region

  • Balance the number of clock buffer levels across the clock tree branches when there is a synchronous path between those branches.

    For example, consider an MMCM clock called clk0 that drives both group A (sequential cells driven via a BUFGCTRL located in a different clock region) and group B (sequential cells). To better match the delay between the branches, insert a BUFGCE for group B and place it in the same clock region as the BUFGCTRL. This ensures that the synchronous paths between group A and group B have a controlled amount of skew. The following figure shows an example.

    Note: The Vivado logic optimization command opt_design is not aware of the timing relationship between timing clocks and clock network branches. As a result, opt_design removes as many cascaded or redundant clock buffers as possible. In this example, opt_design removes BUFGCE_inst_1 unless you set a DONT_TOUCH="TRUE" property on it. If there are only asynchronous paths between the clock tree branches, the branches do not need to be balanced as long as there is proper synchronization circuitry on the receiving clock domain.
    Figure 2. Balancing Clock Trees for Synchronous Paths Between Clock Regions

  • Build clock multiplexers as described in Clock Multiplexing.

To reduce the variation of insertion delays and skew, AMD recommends the following when using cascaded clock buffers:

  • Keep the cascaded buffers in the same or adjacent clock regions.
  • When clock tree branches are balanced, assign all the clock buffers of the same level to the same clock region.
Note: If absolutely required, AMD recommends using two cascaded BUFGCTRLs instead of cascaded BUFGCEs. Using dedicated routing, you can cascade two adjacent BUFGCTRLs with minimum delay when both BUFGCTRLs are placed inside the same clock region.