UltraScale devices introduced the MCAP, a dedicated connection from one specific PCIe block on a device to the configuration engine, providing an efficient mechanism for delivering partial bitstreams. No explicit routes are required to connect the PCIe block to the configuration engine, saving considerable resources.
The MCAP is enabled by customizing an AMD PCIe IP with Dynamic Function eXchange or Tandem Configuration features. These features are available for three IP cores that support PCI Express:
- UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156)
- AXI Bridge for PCI Express Gen3 Subsystem Product Guide (PG194)
- DMA/Bridge Subsystem for PCI Express Product Guide (PG195)
- UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213)
Tandem Configuration uses a two-stage methodology that enables the IP to meet the configuration time requirements indicated in the PCI Express Specification. The following use cases are supported with this technology:
- Tandem PROM
- Loads the single two-stage bitstream from the flash.
- Tandem PCIe
- Loads the first stage bitstream from flash, and deliver the second stage bitstream over the PCIe link to the MCAP.
- Tandem with Field Updates
- After a Tandem PROM (UltraScale only) or Tandem PCIe (UltraScale or UltraScale+) initial configuration, update the entire user design while the PCIe link remains active. The update region (floorplan) and design structure are pre-defined, and Tcl scripts are provided.
- Tandem + Dynamic Function eXchange
- This is a more general case of Tandem Configuration followed by Dynamic Function eXchange of any size or number of dynamic regions.
- DFX over PCIe
- This is a standard configuration followed by DFX, using the PCIe / MCAP as the delivery path of partial bitstreams.
The Tandem and DFX combined solutions have few additional requirements. This
approach requires that the Pblocks for the HD.TANDEM_IP_PBLOCK
and HD.RECONFIGURABLE
parts of the design do not overlap. Otherwise, like
a standard DFX design, any number or size RPs can be defined.
To enable any of these capabilities, select the appropriate option when customizing the core. In the Basic tab:
- Change the Mode to Advanced.
- Change the Tandem Configuration or Partial
Reconfiguration option according to your desired case
in UltraScale:
- Tandem for Tandem PROM, Tandem PCIe or Tandem + Dynamic Function eXchange use cases
- Tandem with Field Updates ONLY for the pre-defined Field Updates use case
- DFX over PCIe to enable the MCAP link for Dynamic Function eXchange, without enabling Tandem Configuration
- Change the Tandem Configuration or Partial
Reconfiguration option according to your desired case
in UltraScale+:
- Tandem PROM for Tandem PROM or Tandem + Dynamic Function eXchange use cases
- Tandem PCIe for Tandem PCIe or Tandem + Dynamic Function eXchange use cases
- Tandem PCIe with Field Updates ONLY for the pre-defined Field Updates use case; Tandem PROM does not support Field Updates in UltraScale+
-
DFX over PCIe to
enable the MCAP link for Dynamic Function eXchange, without enabling
Tandem Configuration
The PCIe block that must be selected in most cases is the lowest instance in the device, except for SSI devices with three super logic regions (SLRs), in which case it is the lowest PCIe instance in the center SLR. A complete listing of the specific supported blocks is shown below in Table 1. All other PCIe blocks do not have the dedicated MCAP feature.
- In UltraScale devices
- both stage 1 and stage 2 must come from the same design image. When it is time for the field update event to occur, clearing bitstreams are required and partial bitstreams, not stage 2 bitstreams, are delivered to change the application.
- In UltraScale+ devices
- Reconfigurable Stage 2 are supported. This means that
after stage 1 is loaded, any compatible stage 2 bitstream can be
delivered over the PCIe link to complete
the initial configuration. When it is time for the field update event to
occur, any compatible stage 2 bitstream can be used as a partial
bitstream to change the application.Tip: It is expected that any designer using this Field Updates solution will start with the example design generated by the customized IP as the starting point.
For complete information about Tandem Configuration, including required PCIe block locations, design flow examples, requirements, restrictions, flow details for Field Updates, and other considerations, see the Tandem Configuration section in UltraScale Devices Gen3 Integrated Block for PCI Express LogiCORE IP Product Guide (PG156) for UltraScale devices. For UltraScale+ devices, see UltraScale+ Devices Integrated Block for PCI Express LogiCORE IP Product Guide (PG213).
Device | Package PCIe Block | PCIe Reset Location | Status |
---|---|---|---|
Kintex UltraScale | |||
KU025 | PCIE_3_1_X0Y0 | IOB_X1Y103 | Production |
KU035 | PCIE_3_1_X0Y0 | IOB_X1Y103 | Production |
KU040 | PCIE_3_1_X0Y0 | IOB_X1Y103 | Production |
KU060 | PCIE_3_1_X0Y0 | IOB_X2Y103 | Production |
KU085 | PCIE_3_1_X0Y0 | IOB_X2Y103 | Production |
KU095 | PCIE_3_1_X0Y0 | IOB_X1Y103 | Production |
KU115 | PCIE_3_1_X0Y0 | IOB_X2Y103 | Production |
AMD Virtex™ UltraScale™ | |||
XVU065 | PCIE_3_1_X0Y0 | IOB_X1Y103 | Production |
VU080 | PCIE_3_1_X0Y0 | IOB_X1Y103 | Production |
VU095 | PCIE_3_1_X0Y0 | IOB_X1Y103 | Production |
VU125 | PCIE_3_1_X0Y0 | IOB_X1Y103 | Production |
VU160 | PCIE_3_1_X0Y1 | IOB_X1Y363 | Production |
VU190 | PCIE_3_1_X0Y2 | IOB_X1Y363 | Production |
VU440 | PCIE_3_1_X0Y2 | IOB_X1Y363 | Production |
To easily find the package pin for the dedicated PCIe Reset for UltraScale devices, issue the following Tcl command:
get_package_pins -of_objects [get_sites IOB_X1Y103]
Device | Package PCIe Block | Status 1 |
---|---|---|
Kintex UltraScale+ | ||
KU3P | PCIE40E4_X0Y0 | Production |
KU5P | PCIE40E4_X0Y0 | Production |
KU11P | PCIE40E4_X1Y0 | Production |
KU15P | PCIE40E4_X1Y0 | Production |
KU19P | no MCAP enabled PCIe site | Unsupported for PCIe delivery2 |
Virtex UltraScale+ | ||
VU3P | PCIE40E4_X1Y0 | Production |
VU5P | PCIE40E4_X1Y0 | Production |
VU7P | PCIE40E4_X1Y0 | Production |
VU9P | PCIE40E4_X1Y2 | Production |
VU11P | PCIE40E4_X0Y0 | Production |
VU13P | PCIE40E4_X0Y1 | Production |
VU19P | PCIE4CE4_X0Y2 | Production |
VU23P | PCIE4CE4_X0Y0 | Production |
VU27P | PCIE40E4_X0Y | Production |
VU29P | PCIE40E4_X0Y0 | Production |
VU31P | PCIE4CE4_X1Y0 | Production |
VU33P | PCIE4CE4_X1Y0 | Production |
VU35P | PCIE4CE4_X1Y0 | Production |
VU37P | PCIE4CE4_X1Y0 | Production |
VU45P | PCIE4CE4_X1Y0 | Production |
VU47P | PCIE4CE4_X1Y0 | Production |
VU57P | PCIE4CE4_X1Y0 | Production |
AMD Zynq™ UltraScale+™ MPSoC | ||
ZU4EV/EG/CC | PCIE40E4_X0Y1 | Production |
ZU5EV/EG/CC | PCIE40E4_X0Y1 | Production |
ZU7EV/EG/CC | PCIE40E4_X0Y1 | Production |
ZU11EG | PCIE40E4_X1Y0 | Production |
ZU17EG | PCIE40E4_X1Y0 | Production |
ZU19EG | PCIE40E4_X1Y0 | Production |
|
The MCAP is capable of operating at 200 MHz with a 32-bit data path. Traditionally bitstreams are loaded into the MCAP from a host PC through PCI Express configuration packets. In these systems the host PC and host PC software are the main factors which limit MCAP performance and bitstream throughput. Because PCIe performance of specific host PC and host PC software can vary widely, overall MCAP performance throughput might vary.
For more information and sample drivers, see Answer Record 64761.
If the performance of partial bitstream delivery using the MCAP port is insufficient, the ICAP can be used instead. While this approach does require additional logic to funnel configuration data from the PCIe end point to this internal configuration port, the ICAP can be saturated with 32-bit configuration data at the maximum clock rate (200 MHz for monolithic devices, 125 MHz for SSI devices). See Fast Partial Reconfiguration Over PCI Express Application Note (XAPP1338) for more information and an example design.
Tandem Configuration and Versal Devices
Versal devices support Tandem Configuration for PCIe end points within the CPM. While the device image structure is quite different, the concept of a two-stage load remains. Both Tandem PROM and Tandem PCIe variations are available for devices with CPM instances; Tandem is not supported for the PL-based PCIe sites. See Versal Adaptive SoC CPM Mode for PCI Express Product Guide (PG346) and Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347) for more information.