Cascaded Clock Buffers - 2024.1 English

Versal Adaptive SoC Hardware, IP, and Platform Development Methodology Guide (UG1387)

Document ID
UG1387
Release Date
2024-06-19
Version
2024.1 English

In general, AMD does not recommend using cascaded buffers to artificially increase the delay and reduce the skew between unrelated clock trees branches. Unlike connections between BUFGCTRLs, other clock buffer connections do not have a dedicated path in the architecture. Therefore, the relative placement of clock buffers is not predictable, and all placement rules take precedence over placing unconstrained cascaded buffers.

However, you can use cascaded clock buffers to achieve the following:

  • Route from an XPIO corner bank to a clock resource located in a different clock region.

    In Versal devices, XPIO corner banks cannot connect to general fabric interconnect and have limited global clocking features. For example, an XPIO corner bank is able to access a general-purpose BUFGCE but is unable to access BUFGCTRL and BUFGCE_DIV resources. The following figure shows the output of an IOB-MMCM-BUFG path reaching parallel BUFGCE and BUFGCE_DIV in another XPIO clock region for insertion delay matching.

    Figure 1. XPIO Corner Bank Cascaded Clock Buffers
    Following are the constraints to implement the clocking topology for the corner bank:
    #PORT clkIn is in Corner Bank GCIO
    set_property PACKAGE_PIN AF35 [get_ports clkIn]
    
    #prevent opt_design removal of the BUFG_inst to BUFG_casc_inst cascade
    set_property DONT_TOUCH TRUE [get_cells {BUFG_inst BUFG_casc_inst}]
    
    #design of the clock structures
    set_property CLOCK_DEDICATED_ROUTE SAME_CMT_ROW [get_nets -of [get_pins BUFG_inst/O]]
    set_property CLOCK_REGION X6Y0 [get_cells BUFGCE_DIV_casc_inst]
    set_property CLOCK_REGION X6Y0 [get_cells BUFG_casc_inst]
    set_property CLOCK_DELAY_GROUP myDelayGrp [get_nets -of [get_pins {BUFGCE_DIV_casc_inst/O BUFG_casc_inst/O}]]
  • Route the clock to another clock buffer located in a different clock region.

    This method is typical when using a clock multiplexer for clocks generated by MMCMs located in different clock regions. Although one of the MMCMs can directly drive the BUFGCTRL (BUFGMUX), the other MMCM requires an intermediate clock buffer to route the clock signal to the other region. The following figure shows an example.

    Figure 2. Routing the Clock to Another Clock Region

    Following are the constraints to implement the clocking topology for the clock multiplexer:

    #PORT clkIn_1 is in XPIO ClockRegion X3Y0 GCIO
    set_property PACKAGE_PIN AU27 [get_ports clkIn_1]
    
    #PORT clkIn_2 is in XPIO ClockRegion X8Y0 GCIO
    set_property PACKAGE_PIN AJ5 [get_ports clkIn_2]
    
    #Guide placement of BUFGMUX
    set_property CLOCK_DEDICATED_ROUTE SAME_CMT_ROW [get_nets -of [get_pins BUFG_inst_2/O]]
  • Balance the number of clock buffer levels across the clock tree branches when there is a synchronous path between those branches.

    For example, consider a clock on the output of MMCME5_inst_2 that drives both group_A (sequential cells driven via a BUFGCTRL located in a different clock region) and group_B (sequential cells). To better match the delay between the branches, insert a BUFGCE for group_B and place it in the same clock region as the BUFGCTRL. This ensures that the synchronous paths between group_A and group_B have a controlled amount of skew. The following figure shows an example.

    Note: The Vivado tools logic optimization command opt_design is not aware of the timing relationship between timing clocks and clock network branches. As a result, opt_design removes as many cascaded or redundant clock buffers as possible. In this example, opt_design removes BUFG_inst_casc_2 unless you set a DONT_TOUCH="TRUE" property on it. If there are only asynchronous paths between the clock tree branches, the branches do not need to be balanced as long as there is proper synchronization circuitry on the receiving clock domain.
    Figure 3. Balancing Clock Trees for Synchronous Paths Between Clock Regions
    Following are the constraints to implement the clock tree balancing circuit:
    #PORT clkIn_1 is in XPIO ClockRegion X3Y0 GCIO
    set_property PACKAGE_PIN AU27 [get_ports clkIn_1]
    
    #PORT clkIn_2 is in XPIO ClockRegion X8Y0 GCIO
    set_property PACKAGE_PIN AJ5 [get_ports clkIn_2]
    
    #allow for routing from BUFG_inst_2 (X8Y0) to BUFG_inst_casc_2 (X3Y0) and prevent optimization
    set_property CLOCK_DEDICATED_ROUTE SAME_CMT_ROW [get_nets -of [get_pins BUFG_inst_2/O]]
    set_property CLOCK_REGION X3Y0 [get_cells BUFG_inst_casc_2]
    set_property DONT_TOUCH TRUE [get_cells BUFG_inst_casc_2]
    
    #balance output of BUFGMUX and BUFG_inst_casc_2, both placed in X3Y0
    set_property CLOCK_DELAY_GROUP myDelayGrp [get_nets -of [get_pins {BUFG_inst_casc_2/O BUFGMUX_inst/O}]]
    
  • Build clock multiplexers.

    To reduce the variation of insertion delays and skew, AMD recommends the following when using cascaded clock buffers:

    • Keep the cascaded buffers in the same or adjacent clock regions.
    • When clock tree branches are balanced, assign all the clock buffers of the same level to the same clock region.
      Note: If absolutely required, AMD recommends using two cascaded BUFGCTRLs instead of cascaded BUFGCEs. Using dedicated routing, you can cascade two adjacent BUFGCTRLs with minimum delay when both BUFGCTRLs are placed inside the same clock region.