In global clock routing, the clock net is first routed from a global clock buffer via the horizontal and vertical routing track to a central location called the clock root. From the clock root, the clock net drives clock rows in each clock region via three levels of vertical distribution tracks creating a balanced H-tree to minimize clock skew in the vertical direction. The horizontal clock distribution is segmented at the clock region boundaries and there are programmable delays at the clock region boundaries that provide skew balancing in the horizontal direction.
The programmable delays are largest at the clock root and decrease from there to the edges of the clock network. For some clocking topologies it might be more important to reduce clock insertion delays rather than minimizing clock skew. For example, for synchronous CDC clocking paths where MBUFG cannot be used and parallel BUFG_GT or BUFGCE_DIV clock buffers drive the synchronous clocks, it can be important to minimize insertion delay to reduce minimum/maximum delay variation between the related clocks. In this case, a CLOCK_DELAY_GROUP property should be applied to the parallel clocks to match the clock routing. Assigning the clock root next to the loads that require the minimal insertion delay using the USER_CLOCK_ROOT property further minimizes the skew between the synchronous CDC clocks.