Divided Outputs on MBUFGCE Primitives Not Allowed for Boundary Clock Nets
MBUFG primitives in Versal devices allow clock division at the leaf level to reduce clock track utilization and improve timing closure on synchronous CDCs. For DFX designs, MBUFG optimization is allowed only for static clock nets, internal RM clock nets, or usage of only the undivided O1 output at the RP boundary. The O1 clock output from a static MBUFG can drive loads in one or more dynamic regions. Boundary clock nets can continue to use BUFGCE_DIV/MMCM/PLL clocking primitives for clock division. However, this will have reduced QoR benefits compared to using MBUFG primitives because the latter provides common clock node closer to loads at the leaf level. Therefore, it is recommended to use MMCM/PLL inside partitions of the DFX design to convert a boundary clock net to an internal clock net that can leverage the full set of MBUFG optimizations of the Vivado tools. The CLRB_LEAF input on the MBUFG primitives is used to asynchronously reset the BUFDIV_LEAF dividers. There are cases where special handling is required to ensure that BUFDIV_LEAF dividers are reset to their startup state. If the clock modifying that drives the MBUFG is reset in between the operation, the MBUFG output clocks should also be reset to get synchronized. If divided output clocks from an MBUFG in static drive inputs to Reconfigurable Partitions, the following error will occur:
ERROR: [DRC HDPR-99] Versal Illegal MBUFGxx drivers in pblock: Reconfigurable Pblock ‘<pblock_name>' contains a MBUFGxx boundary clock net driver ‘<MBUFG Driver Name>'
Restrictions in Clock Resource Usage Due to Clock Tile Splitting
When a row of clock tiles is shared between multiple RPs, it is possible that some of the sites along this row cannot be used for placement. To avoid potential unroutability because of tile splitting, the DFX flow automatically prohibits usage of certain clocking or logical resource tiles. To avoid this scenario, AMD recommends keeping a gap of at least one clock region between multiple RP Pblocks if utilization estimation meets the design need.
For example, in a multi-RP design, (with RPs RP1 and RP2), if a clock for RP2 is required to traverse through RP1 to reach loads in RP2, some block RAM sites in the traversed RP (RP1) will be prohibited for use by the placer.
In this scenario some of the clock routing resources in RP1 are also
claimed by RP2 so they are shared equally. The RCLK tiles RCLK_BRAM_CLKBUF_* are
part of clock routing network and due to sharing by the 2 RPs, only the top or
bottom half can be claimed by RP1. Due to the configuration frame programming during
reconfiguration, RCLK_BRAM_CLKBUF tiles must be programmed together with all BRAM
tiles in the same half column. A Critical Warning will be issued during opt_design
for such a scenario. The prohibited sites
can be viewed in the Device View.
[Constraints 18-5689] RCLK tile RCLK_BRAM_CLKBUF_CORE_X*Y* is shared by PBLOCK RP1 (owns LSB tracks) and PBLOCK RP2 (owns MSB tracks). For the shared usage, BRAM tiles and their adjacent interface tiles at the NORTH of the shared RCLK tile are prohibited because they could not be used for placement within PBLOCK RP1.
Clocking Instances Can Be Prohibited Due to Expanded Routing Footprints
For DFX designs with two or more Reconfigurable Partitions, clock buffers can be blocked from use if two RP footprints expand to both cover the same resources. In the following floorplan, both RP (rp1rm1 shown in blue and rp2rm1 shown in yellow) occupy space in the X9 column (the farthest clock region on the right). Clock region X9Y2 is legally shared between the two partitions.
When routing expansion obtains BUFG_GT resources, both try to collect the sites along the right side of the chip. The following image shows the expanded routing footprint of pblock_rp1rm1, which includes sites within the pblock_rp2rm1 area.