Step 4: Improving Bus Timing through Placement - 2023.2 English

Vivado Design Suite Tutorial: Implementation (UG986)

Document ID
Release Date
2023.2 English
To improve the timing of the wbOutputData bus, place the output registers closer to their respective output pads, then rerun timing to look for any improvement. To place the output registers, identify potential placement sites, and then use a sequence of Tcl commands, or a Tcl script, to place the cells and reroute the connections.
  1. In the Device window, click to disable Routing Resources and make sure that Autofit Selection is still enabled on the toolbar.

    This lets you see placed objects more clearly in the Device window, without the added details of the routing.

  2. Select the wbOutputData ports placed on the I/O blocks with the following Tcl command:
    select_objects [get_ports wbOutputData*]

    The Device window shows the selected ports highlighted in white, and zoom to fit the selection. By examining the device resources around the selected ports, you can identify a range of placement sites for the output registers.

  3. Zoom into the Device window around the bottom selected output ports. The following figure shows the results.

    The bottom ports are the lowest bits of the output bus, starting with wbOutputData[0].

    This port is placed on Package Pin Y21. Over to the right, where the Slice logic contains the device resources needed to place the output registers, the Slice coordinates are X0Y36. Use that location as the starting placement for the 32 output registers, wbOutputData_reg[31:0].

    By scrolling or panning in the Device window, you can visually confirm that the highest output data port, wbOutputData[31], is placed on Package Pin K22, and the registers to the right are in Slice X0Y67.

    Now that you have identified the placement resources needed for the output registers, make sure that they are available for placing the cells. You can do this by quickly unplacing the Slices to clear any currently placed logic.

  4. Unplace any cells currently assigned to the range of slices needed for the output registers, SLICE_X0Y36 to SLICE_X0Y67, with the following Tcl command:
    for {set i 0} {$i<32} {incr i}  {
       unplace_cell [get_cells -of [get_sites SLICE_X0Y[expr 36 + $i]]] 

    This command uses a for loop with an index counter (i) and a Tcl expression (36 + $i) to get and unplace any cells found in the specified range of Slices. For more information on for loops and other scripting suggestions, refer to the Vivado Design Suite User Guide: Using Tcl Scripting (UG894).

    Tip: If there are no cells placed within the specified slices, warning messages are displayed stating that nothing has been unplaced. Ignore these messages safely.

    With the slices cleared of any current logic cells, the needed resources are available for placing the output registers. After placing those, replace any logic that was unplaced in the last step.

  5. Place the output registers, wbOutputData_reg[31:0], in the specified slice range with the following command:
    for {set i 0} {$i<32} {incr i}  {
       place_cell wbOutputData_reg[$i] SLICE_X0Y[expr 36 + $i]/AFF 
  6. Place any remaining unplaced cells with the following command:
    Note: The Vivado placer works incrementally on a partially placed design.
  7. As a precaution, unroute any nets connected to the output register cells, wbOutputData_reg[31:0], using the following Tcl command:
    route_design -unroute -nets [get_nets -of [get_cells \ wbOutputData_reg[*]]]
  8. Route any currently unrouted nets in the design:
    Note: The Vivado router works incrementally on a partially routed design.
  9. Analyze the route status of the current design to ensure that there are no routing conflicts:
  10. Click the Routing Resources button to view the detailed routing resources in the Device window.
  11. Mark the output ports and registers again, and re-highlight the routing between them using the following Tcl commands:
    mark_objects -color blue [get_ports wbOutputData[*]]
    mark_objects -color red [get_cells wbOutputData_reg[*]]
    highlight_objects -color yellow [get_nets -of [get_pins -of [get_cells\
    wbOutputData_reg[*]] -filter DIRECTION==OUT]]
    Tip: Because you have entered these commands before, you can copy them from the journal file (vivado.jou) to avoid typing them again.
  12. In the Device window, zoom into some of the marked output ports.
  13. Select the nets connecting to them.
    Tip: You can also select the nets in the Netlist window, and they are cross-selected in the Device window.

    In the Device window, you can see that all output registers are now placed equidistant from their associated outputs, and the routing path is similar for all the nets from the output register to output. This results in clock-to-out times that are closely matched between the outputs.

  14. Run the Reports > Timing > Report Datasheet command again.

    The Report Datasheet dialog box is populated with settings from the last time you ran it:

    • Reference: [get_ports {wbOutputData[0]}]
    • Ports: [get_ports {wbOutputData[*]}]
  15. In the Report Datasheet results, select the Max/Min Delays for Groups > Clocked by wbClk > wbOutputData[0] section.

    Examining the results, the timing skew is closely matched within both the lower bits, wbOutputData[0-13], and the upper bits, wbOutputData[14-31], of the output bus. While the overall skew is reduced, it is still over 200 ps between the upper and lower bits.

    With the improved placement, the skew is now a result of the output ports and registers spanning two clock regions, X0Y0 and X0Y1, which introduces clock network skew. Looking at the wbOutputData bus, the Max delay is greater on the lower bits than it is on the upper bits. To reduce the skew, add delay to the upper bits.

    You can eliminate some of the skews using a BUFMR/BUFR combination instead of a BUFG, to clock the output registers. However, for this tutorial, use manual routing to add delay from the output registers clocked by the BUFG to the output pins for the upper bits, wbOutputData[14-31], to further reduce the clock-to-out variability within the bus.