Tandem Configuration and Dynamic Function eXchange - 2024.1 English

Vivado Design Suite User Guide: Dynamic Function eXchange (UG909)

Document ID
UG909
Release Date
2024-06-12
Version
2024.1 English

UltraScale devices introduced the MCAP, a dedicated connection from one specific PCIe block on a device to the configuration engine, providing an efficient mechanism for delivering partial bitstreams. No explicit routes are required to connect the PCIe block to the configuration engine, saving considerable resources.

The MCAP is enabled by customizing an AMD PCIe IP with Dynamic Function eXchange or Tandem Configuration features. These features are available for three IP cores that support PCI Express:

  • UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156)
  • AXI Bridge for PCI Express Gen3 Subsystem Product Guide (PG194)
  • DMA/Bridge Subsystem for PCI Express Product Guide (PG195)
  • UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213)

Tandem Configuration uses a two-stage methodology that enables the IP to meet the configuration time requirements indicated in the PCI Express Specification. The following use cases are supported with this technology:

Tandem PROM
Loads the single two-stage bitstream from the flash.
Tandem PCIe
Loads the first stage bitstream from flash, and deliver the second stage bitstream over the PCIe link to the MCAP.
Tandem with Field Updates
After a Tandem PROM (UltraScale only) or Tandem PCIe (UltraScale or UltraScale+) initial configuration, update the entire user design while the PCIe link remains active. The update region (floorplan) and design structure are pre-defined, and Tcl scripts are provided.
Tandem + Dynamic Function eXchange
This is a more general case of Tandem Configuration followed by Dynamic Function eXchange of any size or number of dynamic regions.
DFX over PCIe
This is a standard configuration followed by DFX, using the PCIe / MCAP as the delivery path of partial bitstreams.

The Tandem and DFX combined solutions have few additional requirements. This approach requires that the Pblocks for the HD.TANDEM_IP_PBLOCK and HD.RECONFIGURABLE parts of the design do not overlap. Otherwise, like a standard DFX design, any number or size RPs can be defined.

To enable any of these capabilities, select the appropriate option when customizing the core. In the Basic tab:

  1. Change the Mode to Advanced.
  2. Change the Tandem Configuration or Partial Reconfiguration option according to your desired case in UltraScale:
    • Tandem for Tandem PROM, Tandem PCIe or Tandem + Dynamic Function eXchange use cases
    • Tandem with Field Updates ONLY for the pre-defined Field Updates use case
    • DFX over PCIe to enable the MCAP link for Dynamic Function eXchange, without enabling Tandem Configuration
  3. Change the Tandem Configuration or Partial Reconfiguration option according to your desired case in UltraScale+:
    • Tandem PROM for Tandem PROM or Tandem + Dynamic Function eXchange use cases
    • Tandem PCIe for Tandem PCIe or Tandem + Dynamic Function eXchange use cases
    • Tandem PCIe with Field Updates ONLY for the pre-defined Field Updates use case; Tandem PROM does not support Field Updates in UltraScale+
    • DFX over PCIe to enable the MCAP link for Dynamic Function eXchange, without enabling Tandem Configuration

The PCIe block that must be selected in most cases is the lowest instance in the device, except for SSI devices with three super logic regions (SLRs), in which case it is the lowest PCIe instance in the center SLR. A complete listing of the specific supported blocks is shown below in Table 1. All other PCIe blocks do not have the dedicated MCAP feature.

Tandem with Field Updates: Tandem (PCIe) with Field Updates is a predefined design structure that combines Tandem and DFX in a single design.
In UltraScale devices
both stage 1 and stage 2 must come from the same design image. When it is time for the field update event to occur, clearing bitstreams are required and partial bitstreams, not stage 2 bitstreams, are delivered to change the application.
In UltraScale+ devices
Reconfigurable Stage 2 are supported. This means that after stage 1 is loaded, any compatible stage 2 bitstream can be delivered over the PCIe link to complete the initial configuration. When it is time for the field update event to occur, any compatible stage 2 bitstream can be used as a partial bitstream to change the application.
Tip: It is expected that any designer using this Field Updates solution will start with the example design generated by the customized IP as the starting point.

For complete information about Tandem Configuration, including required PCIe block locations, design flow examples, requirements, restrictions, flow details for Field Updates, and other considerations, see the Tandem Configuration section in UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156) for UltraScale devices. For UltraScale+ devices, see UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213).

Table 1. UltraScale: PCIe Block and Reset Locations Supporting DFX, by Device
Device Package PCIe Block PCIe Reset Location Status
Kintex UltraScale
KU025 PCIE_3_1_X0Y0 IOB_X1Y103 Production
KU035 PCIE_3_1_X0Y0 IOB_X1Y103 Production
KU040 PCIE_3_1_X0Y0 IOB_X1Y103 Production
KU060 PCIE_3_1_X0Y0 IOB_X2Y103 Production
KU085 PCIE_3_1_X0Y0 IOB_X2Y103 Production
KU095 PCIE_3_1_X0Y0 IOB_X1Y103 Production
KU115 PCIE_3_1_X0Y0 IOB_X2Y103 Production
AMD Virtex™ UltraScale™
XVU065 PCIE_3_1_X0Y0 IOB_X1Y103 Production
VU080 PCIE_3_1_X0Y0 IOB_X1Y103 Production
VU095 PCIE_3_1_X0Y0 IOB_X1Y103 Production
VU125 PCIE_3_1_X0Y0 IOB_X1Y103 Production
VU160 PCIE_3_1_X0Y1 IOB_X1Y363 Production
VU190 PCIE_3_1_X0Y2 IOB_X1Y363 Production
VU440 PCIE_3_1_X0Y2 IOB_X1Y363 Production

To easily find the package pin for the dedicated PCIe Reset for UltraScale devices, issue the following Tcl command:

get_package_pins -of_objects [get_sites IOB_X1Y103]
Table 2. UltraScale+: PCIe Block Locations Supporting DFX, by Device
Device Package PCIe Block Status 1
Kintex UltraScale+
KU3P PCIE40E4_X0Y0 Production
KU5P PCIE40E4_X0Y0 Production
KU11P PCIE40E4_X1Y0 Production
KU15P PCIE40E4_X1Y0 Production
KU19P no MCAP enabled PCIe site Unsupported for PCIe delivery2
Virtex UltraScale+
VU3P PCIE40E4_X1Y0 Production
VU5P PCIE40E4_X1Y0 Production
VU7P PCIE40E4_X1Y0 Production
VU9P PCIE40E4_X1Y2 Production
VU11P PCIE40E4_X0Y0 Production
VU13P PCIE40E4_X0Y1 Production
VU19P PCIE4CE4_X0Y2 Production
VU23P PCIE4CE4_X0Y0 Production
VU27P PCIE40E4_X0Y Production
VU29P PCIE40E4_X0Y0 Production
VU31P PCIE4CE4_X1Y0 Production
VU33P PCIE4CE4_X1Y0 Production
VU35P PCIE4CE4_X1Y0 Production
VU37P PCIE4CE4_X1Y0 Production
VU45P PCIE4CE4_X1Y0 Production
VU47P PCIE4CE4_X1Y0 Production
VU57P PCIE4CE4_X1Y0 Production
AMD Zynq™ UltraScale+™ MPSoC
ZU4EV/EG/CC PCIE40E4_X0Y1 Production
ZU5EV/EG/CC PCIE40E4_X0Y1 Production
ZU7EV/EG/CC PCIE40E4_X0Y1 Production
ZU11EG PCIE40E4_X1Y0 Production
ZU17EG PCIE40E4_X1Y0 Production
ZU19EG PCIE40E4_X1Y0 Production
  1. For the most up-to-date information on core and device support status, consult the product guide for the specific version of the IP you wish to use.
  2. The KU19P has no master PCIe block instance; none of the three PCIe blocks in the device contain the MCAP connection to the configuration engine. Tandem Configuration is not supported for this device, and any partial bitstream delivery via PCIe must be sent to the ICAP.
Note: Any device not listed in this table does not have a PCIe site in the programmable logic portion of the device, or, like Zynq RFSoC devices, does not have an MCAP-enabled PCIe site in the programmable logic. Unlike UltraScale, UltraScale+ does not have a dedicated connection to a PCIe Reset pin, but AMD recommends using a pin in Bank 65.

The MCAP is capable of operating at 200 MHz with a 32-bit data path. Traditionally bitstreams are loaded into the MCAP from a host PC through PCI Express configuration packets. In these systems the host PC and host PC software are the main factors which limit MCAP performance and bitstream throughput. Because PCIe performance of specific host PC and host PC software can vary widely, overall MCAP performance throughput might vary.

For more information and sample drivers, see Answer Record 64761.

If the performance of partial bitstream delivery using the MCAP port is insufficient, the ICAP can be used instead. While this approach does require additional logic to funnel configuration data from the PCIe end point to this internal configuration port, the ICAP can be saturated with 32-bit configuration data at the maximum clock rate (200 MHz for monolithic devices, 125 MHz for SSI devices). See Fast Partial Reconfiguration Over PCI Express Application Note (XAPP1338) for more information and an example design.

Tandem Configuration and Versal Devices

Versal devices support Tandem Configuration for PCIe end points within the CPM. While the device image structure is quite different, the concept of a two-stage load remains. Both Tandem PROM and Tandem PCIe variations are available for devices with CPM instances; Tandem is not supported for the PL-based PCIe sites. See Versal Adaptive SoC CPM Mode for PCI Express Product Guide (PG346) and Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347) for more information.