Limit High-Fanout Nets in Congested Areas - 2023.2 English

UltraFast Design Methodology Guide for FPGAs and SoCs (UG949)

Document ID
Release Date
2023.2 English
Note: This optimization technique is automatically applied by the report_qor_suggestions Tcl command.

High fanout nets that have tight timing constraints require tightly clustered placement to meet timing. This can cause localized congestion as shown in the following figure. High fanout nets can also contribute to congestion by consuming routing resources that are no longer available for other nets in the congestion window.

To analyze the impact of high fanout non-global nets on routability in the congestion window you can:

  • Select the leaf cells of the top hierarchical modules in the congestion window.
  • Use the find command (Edit > Find) to select all of the nets of the selected cell objects (filter out Global Clocks, Power, and Ground nets).
  • Sort the nets in decreasing Flat Pin Count order.
  • Select the top fan-out nets to show them in relation to the congestion window.

This can quickly help you identify high-fanout nets which potentially contribute to congestion.

Figure 1. High-Fanout Nets in Congestion Window

For high fanout nets with tight timing constraints in the congestion window, replicating the driver will help relaxing the placement constraints and alleviate congestion.

High fanout nets (fanout > 5000) with sufficient positive timing slack can be routed on global clock resources instead of fabric resources. The placer automatically routes high fanout nets with fanout > 1000 on global routing resources if those resources are available towards the end of the placer step. This optimization only occurs if it does not degrade timing.

You can also set the property CLOCK_BUFFER_TYPE=BUFG on the net and let synthesis or logic optimization automatically insert the buffer prior to the placer step. Review the newly inserted buffer placement along with its driver and loads placement after place_design to verify that it is optimal. If it is not optimal, use the CLOCK_REGION constraint (UltraScale devices only) or LOC constraint (7 series devices only) on the clock buffer to control its placement.