PCIe Subsystem - PCIe Subsystem - AM026

Versal AI Edge Series Gen 2 and Prime Series Gen 2 Technical Reference Manual (AM026)

Document ID
AM026
Release Date
2025-12-23
Revision
1.3 English

Architecture Overview

The PCIe subsystem within the PS High Speed Connectivity block is known as MDB5. It supports a 4-lane PCIe 5.0 interface, which can be mapped to up to two independent 2-lane PCIe controllers. Each PCIe controller within MDB5 incorporates DMA/Bridge IP - the DMA mode can be used when the controller is configured as a PCIe Endpoint whereas the Bridge mode is available when the controller is configured either as a PCIe Endpoint to support AXI4-MM to PCIe Endpoint Bridge functionality or as a PCIe Root Port to implement a PCIe Root Complex (RC). Each controller also has connectivity with the PIPE mux for the 4-lane PS-GTYP PHY which can operate up to 32 Gbps.

The key features of the MDB5 are as follows:
Note: All feature references are applicable to both PCIe controllers with the exception that PCIe0 supports up to x4 operation whereas PCIe1 supports up to x2 operation.
  • Boot time configurable for Root Complex or Endpoint operation
  • The line rates supported by PCIe are 2.5 Gbps, 5.0 Gbps, 8.0 Gbps, 16.0 Gbps and 32 Gbps.
  • PCIe supports up to 8 Physical Functions (PFs) and 64 Virtual Functions (VFs)
  • PCIe supports bifurcation in 2x2 or 2x1 (bifurcation refers to the ability of the controller to support multiple independent controllers)
  • 2 independent PCIe controllers each of which can be configured as either Root Complex (RC) or Endpoint (EP) during initialization:
    • 4-lane RC or EP
    • Dual 2-lane RC or 2-lane EP
    • 2-lane RC + 2 lane EP
  • 256-bit Inbound and Outbound AXI4 memory-mapped interface from each PCIe controller
  • Support for up to the max PCIe 5.0 line rates
  • Dedicated Clock and Reset interface
  • MSI-X, MSI and legacy interrupts are supported in Bridge Mode (both Endpoint and Root Port); Endpoint DMA Mode only supports legacy MSI and legacy interrupts
  • Support for Advanced Error Reporting (AER) capability
  • IO TLP support in inbound and outbound directions

Functional Description

Refer to the following figure for the MDB5 Block Diagram

Figure 1. MDB5 Block Diagram

Each PCIe controller within MDB5 communicates with the PSXC via a 256b AXI bus. The subsystem also includes switches to support clock domain crossing between the PSXC interconnects. Configuration registers within the PCIe controllers are accessed via a 32b AXI4-Lite bus.

The PCIe IP includes a fully-configurable DMA engine. The DMA engine is used to initiate Endpoint transactions between the Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2’s memory space and an external PCIe Host address space.

Both of the MDB5 PCIe controllers support both Endpoint and Root Port modes of operations. As shown in the above image, each controller comprises two sub-modules.

  • The AXI-PCIe bridge provides AXI-to-PCIe protocol translation and vice-versa, ingress/egress address translation, DMA, and Root Port/Endpoint (RP/EP) mode specific services.
    • The integrated block for PCIe interfaces to the AXI-PCIe bridge on one side and the PS-GTYP transceivers on the other. It performs link negotiation, error detection and recovery, and many other PCIe protocol specific functions.

The following section provides an overview of the clock and reset scheme for the PCIe controller

Clock Scheme

The controller for PCIe operates in multiple clock domains. The following figure shows the clock domains. The pipe_clk and core_clock and all other clocks are derived from the the Ref Clock which we provided to the PS-GTYP. Refer to the figure below for the clock domains.

Figure 2. PCIe Controller Clock Domains

The following table provides a description of the PCIe controller clocks

Table 1. PCIe Controller Clock Description
Clock Description
PCIe 100 MHz reference clock The PCIe protocol specifies a 100 MHz clock with spread spectrum. This clock is used as a reference clock to PS-GTYP transceiver interface, which is part of the PHY. The PS-GTYP transceiver interface generates a 1000 MHz clock from this reference clock for the parallel datapath. This clock comes from external interface (typically on-board clock source in the case of Root Port mode and sourced by a host system via the PCIe slot in the case of Endpoint mode). Only common clock mode is supported for reference clock.
Core Clock This is the primary clock for the controller.This clock is derived from the GTOUTCLK
Ret_core_clock Gated version of core_clk for modules retained during L1 power gating Frequency same as coreclk
Core_clock_ug Ungated version of core_clk. Used by registers during switching to-and-from low power state
Aux_clock/aux_clock_g This is the clock for the PMC. The controller uses this clock for counting time during the L1 substate.
radm_clk_g Gated version of core_clk.
Slv_aclk/mstr_aclk/dbi_aclk Clock for the AXI Bridge.

Reset Scheme

Refer to the following figure for the reset scheme of the PCIe controller.

Figure 3. PCIe Controller Reset Scheme

The following table provides a description of the PCIe controller resets.

Table 2. PCIe Controller Reset Description
Reset Description
pcie_reset_n

This is the PCIe protocol reset. In Endpoint mode, this reset is controlled by the host device, and the Endpoint designated MIO pin can be used as an input for this reset. In Root Port mode, this reset is controlled by the software outside the PCIe block, and the MIO pin can be configured as an output to drive the reset.

When the MIO pin is not allocated to the PCIe, this signal is driven High to allow the PCIe block to come out of reset under local software control (pcie_ctrl_rst_n).

Refer to PMC MIO Pin Tables for default MIO pins used for PCIe Reset.

Note: The reset to the AXI-PCIe bridge is determined by the mode of operation i.e., Root Port or Endpoint.

PCIe Subsystem

Refer to the following figure for the PCIe integrated block diagram.

Figure 4. PCIe Subsystem Diagram

The integrated block for PCIe complies with the PCI Express base specification, rev. 5.x and consists of the physical, data link, and transaction layers. The protocol uses packets to exchange information between layers. Packets are formed in the transaction and data link layers to carry information from the transmitting component to the receiving component. Information is added to the transmitted packet that is required to handle the packet at specific layers. This module implements a large part of the transaction layer logic, all of the data link layer logic, and the MAC portion of the physical layer, including the link training and status state machine (LTSSM). This module will connects to the external PHY through the PIPE.

The functions of the protocol layers include the following.

  • Generating and processing of Transaction Layer Packets (TLPs)
  • Flow-control management
  • Initialization and power management functions
  • Data protection
  • Error checking and retry functions
  • Physical link interface initialization
  • Maintenance and status tracking

The integrated block for PCIe provides a PIPE interface for connection to the gigabit transceivers. The PIPE interface runs at 1000 MHz in Gen5 mode when the PCIe lanes operate at 32 Gb/s or 500 MHz when the PCIe lanes operate at 16 Gb/s.

The PS-GTYP Transceivers utilize serialization and deserialization (SerDes) to convert data between parallel and serial formats. The high-speed transceivers are used through the multiplexer switch and are shared with other blocks in the PS High-Speed Connectivity Unit.

Configuration Control (APB Interface)

The attributes for the integrated block for PCIe (Endpoint or Root Port mode) are configured through the programmable configuration and status registers (CSR) accessible through the APB interface. APB interface uses the apb_clk, which is asynchronous to the other clocks. It has a 32-bit wide address and 32-bit wide data bus interface.

Power Management

The PCIe protocol specification indicates four low-power states: L0s, L1, L2, and L3, with L0s having the least recovery latency (shallow-power state) and L3 having the maximum recovery latency as it involves possible power supply turn-off (deep-power state). L0 is the normal working link state. In addition to the L0 state, the integrated block for PCIe supports the L1 (low power) state. Entry into L1 from L0 needs to be initiated by the software.The integrated block for PCIe support active state power management (ASPM).

Programmed Power Management

To achieve considerable power savings on the PCI Express hierarchy tree, the core supports the following link states of programmed power management (PPM)

  • L0: Active state, data exchange state.
  • L1: Higher latency, lower power standby state.

The PPM protocol is initiated by the downstream component/upstream port.

  • PPM L0 state

The L0 state represents normal operation and is transparent to your logic. The core reaches the L0 state after a successful initialization and training of the PCI Express link as per the protocol.

  • PPM L1 state

The following steps outline the transition of the core to the PPM L1 state.

  1. The transition to a lower power PPM L1 state is always initiated by an upstream device by programming the PCI Express device power state to non-D0 (in the PM capability in configuration register space). The current device power state can be read through APB registers.
  2. The integrated block for PCIe stops accepting any further transactions. Any pending transactions are accepted fully and completed later.
  3. The integrated block for PCIe exchanges appropriate power management data link layer packets (DLLPs) with its link partner to successfully transition the link to a lower power PPM L1 state.

MDB5-DMA

Each controller for PCIe contains a high-performance 8-channel direct memory access (DMA) engine known as MDB5-DMA. Each channel can be programmed for either transmit or receive DMA operation. Each channel can be controlled from both the PCIe or AXI domains. The RC system CPU, or the EP application CPU, can offload the transferring of large blocks of data to the MDB5-DMA controller, leaving the CPU free to perform other tasks. It can simultaneously perform the following types of memory transactions (as shown below):

Figure 5. MDB5-DMA Operation

DMA Write : Transfer (copy) of a block of data from local (application) memory to remote (link partner) memory.

DMA Read : Transfer (copy) of a block of data from remote (link partner) memory to local (application) memory.

Therefore, MDB5-DMA supports full-duplex operation, processing read and write transfers at the same time, and in parallel with normal (non-DMA) traffic. Upon completion of a DMA transfer or an error, the DMA optionally interrupts the local CPU or sends an interrupt MWr (IMWr) to the remote CPU. The DMA is highly-configurable, and you can program it using the local DBI or over the PCIe wire1. MDB5-DMA can operate in non-linked list mode or linked list mode. In linked list mode, MDB5-DMA fetches the transfer control information (called channel context) for each transfer (block), from a list of MDB5-DMA elements that have been constructed in local memory.

The features of the MDB5-DMA are as follows:

  • The maximum DMA transfer size is 4GB, and the minimum transfer size is one byte
  • Support for 8 Read and 8 Write Channels
  • Internal DMA buffers
  • Support for Doorbell register access
  • 256bit AXI4 memory-mapped Data Bus
  • Support for SR-IOV
  • Linked List and Non linked List Mode
  • Full Duplex Operation
  • Separate Tags for DMA Transfers
  • MSI Interrupt Generation
  • Programmable Interrupt Generation
  • Reordering of DMA Transfer completions
  • Channel Assignment to a VF
  • Memory-to-Memory Transfers only, streaming mode is not supported

The following paragraphs describe the MDB5-DMA operation. Refer to the figure below for the list of the DMA operations which an independent thread is associated with.

Figure 6. MDB5-DMA Functional Operation

Descriptor Prefetch

MDB5-DMA features an overlay dual port RAM in the write/read engine, where the linked list descriptors are queued. This RAM is logically split into per-channel descriptor queues (FIFO). After doorbell, MDB5-DMA reads the main memory (software visible) and copies descriptors into these per-channel descriptor queues, ahead of the data transfer. The depth of the overlay RAM is the product of the number of configured read/write channels. The PF_DEPTH of all channels should be set together at the beginning of the programming stage to avoid unexpected behavior.

The limitations of the prefetch operation are listed below:

  • In linked list mode, MDB5-DMA loads the ELEMENT_PREFETCH value from the link element of the transfer list. The ELEMENT_PREFETCH value must match the length of a contiguous set of descriptors in main memory.
  • The descriptor queue size is the same for all channels
  • The linked list pointer (HDMA_LLP_[LOW|HIGH]_[WR|RD]CH-i) must not be programmed to point to a data element in the middle of a contiguous ELEMENT_PREFETCH block

PCIe DMA Read Transfer

The DMA injects multiple MRd requests of size less than or equal to Max_Payload_Size into the inbound request path, directed toward the local application. HDMA converts the read responses into MWr TLPs, aligns the request to the application data bus width for size greater than Max Tranfer Unit (MTU), and transmits to the remote link partner. When the HDMA data transfer is complete, the CPU is notified.

PCIe DMA Write Transfer

MDB5-DMA injects multiple MRd requests of size less than or equal to the minimum of {Max_Read_Request_Size, Max_Payload_Size_Supported} into the outbound request path, directed towards the remote link partner. The DMA converts the read responses into write requests, requests to the application data bus width when size is bigger than Max Tranfer Unit (MTU) for alignment on MWr (the first request alignment guarantees the subsequent alignment), and transmits to the local application. When the MDB5-DMA data transfer is complete, the CPU is notified.

Registers and Context Memory

Each channel has its own exclusive register address segment for better support of virtualized systems. As the transfer progresses, MDB5-DMA updates these registers. The user programs the channel context information and other MDB5-DMA control registers directly with the local CPU through DBI or remotely over the PCIe link. The per-channel register address segments are not contiguous. By default, each channel’s register address segment is separated by 256. Consecutive read/write channel register address segments are separated by 512, for example, if write channel 0’s address space starts at 0x0, then read channel 0’s address space starts at 0x100, write channel 1’s address space starts at 0x200, read channel 1’s address space starts at 0x300, and so on.

Table 3. PCIe DMA Read Channel Address Map
DMA Read Channel AXI Address PCIe Address
0 DREG_BASE cfg_dma_reg_bar(1)
1 DREG_BASE + 0x200 cfg_dma_reg_bar + 0x200
2 DREG_BASE + 0x400 cfg_dma_reg_bar + 0x200
3 DREG_BASE + 0x600 cfg_dma_reg_bar + 0x600
4 DREG_BASE + 0x800 cfg_dma_reg_bar + 0x800
5 DREG_BASE + 0xA00 cfg_dma_reg_bar + 0xA00
6 DREG_BASE + 0xC00 cfg_dma_reg_bar + 0xC00
7 DREG_BASE + 0xE00 cfg_dma_reg_bar + 0xE00
Table 4. PCIe Write Channel Address Map
DMA Write Channel AXI Address PCIe Address
0 DREG_BASE + 0x100 cfg_dma_reg_bar + 0x100
1 DREG_BASE + 0x300 cfg_dma_reg_bar + 0x300
2 DREG_BASE + 0x500 cfg_dma_reg_bar + 0x500
3 DREG_BASE + 0x700 cfg_dma_reg_bar + 0x700
4 DREG_BASE + 0x900 cfg_dma_reg_bar + 0x900
5 DREG_BASE + 0xA00 cfg_dma_reg_bar + 0xA00
6 DREG_BASE + 0xD00 cfg_dma_reg_bar + 0xD00
7 DREG_BASE + 0xF00 cfg_dma_reg_bar + 0xF00
  1. By default, cfg_dma_reg_bar is BAR0.

Interrupts and Error Handling

MDB5-DMA generates the following interrupts per channel:

  • Done: The DMA has successfully completed the transfer
  • Watermark: Interrupt generated at the end of each watermark event (end of linked list element). Generated in LL mode only
  • Abort: MDB5-DMA has failed to complete the transfer, or an error has occurred during the transfer

The interrupts are signaled to the software on the CPU or to the remote host using MSI.

AXI-PCI Express Bridge

The AXI Bridge exposes the AXI interface of the PCIe controller and DMA to the SMMU/CMN in the PS FPD. It provides AXI-to-PCIe protocol translation and vice-versa, ingress/egress address translation and DMA. Refer to the following figure for the bridge in the PCIe subsystem.

Figure 7. AXI-PCIe Bridge

The features of the AXI4-to-PCIe Bridge are listed below:

  • Support to meet PCIe 5.0x4 bandwidth
  • Independent 256b AXI-MM Read and Write channels to connect the PCIe controller and DMA IP to the AXI interconnect
  • 64-bit addressing capability on AXI Read and Write channels
  • AXI-Lite configuration interface to access Bridge Registers
  • Interrupt support – MSI, MSI-X, and INTx legacy interrupts
  • MSI Message to Interrupt Pin decoder with interrupt masking support for up to 64 interrupts
  • Generation of configuration transactions through the enhanced configuration access mechanism (ECAM) and messages by the CPU in Root Port mode
  • BAR Remapping support at Egress and Ingress
  • ECC on internal memories as supported by the PCIe
  • TrustZone support

When a remote master issues a transaction over the PCIe link, it appears as an AXI transaction on the AXI master port. When a local master (in the PS) issues an AXI transaction on the slave port, it goes onto the PCIe link based on address translation apertures set as either memory or configuration TLP.

Details of the AXI master Interface:

  • Writes using the same AXI ID (m_awid, m_wid, m_bid) for all write transactions regardless of the source; hence, these should be completed in order on AXI.
  • Reads initiated by the master AXI that do not complete within the specified timeout period are assumed to never complete and result in a completer abort response on the PCIe link.
  • Writes initiated by the master AXI that do not complete within the specified timeout period are assumed to never complete and are terminated.

Details of the AXI slave the following is true:

  • Transactions that cannot be forwarded to the PCIe interface due to PCIe-specific reasons (for example, the PCIe data link layer is down; PCIe domain is in reset; or when bus master enable = 0 for Endpoint applications) are dropped and completed on AXI with a DECERR status.
  • AXI slave interface initiated reads, I/O writes, and configuration writes that fail to complete after a timeout duration are assumed to never complete and are terminated with a SLVERR response to the AXI. When the AXI clock is 250 MHz, the duration of the timeout is 50 ms. The timeout has a linear relation with the AXI clock, for example, the timeout is 100 ms if the AXI clock is 125 MHz.

Bridge registers are accessed through the AXI slave using a bridge register translation. Various registers like the DMA registers, MSI-X table, and pending-bit array are accessed through their respective translations.

The MSI-X table and PBA are applicable only for Endpoint mode and the corresponding registers are implemented in the AXI-PCIe bridge at predefined offsets. Refer to the following figure that shows the AXI and PCIe domain access of various registers in the AXI-PCIe bridge:

Figure 8. AXI-PCIe Bridge Register

The following are registers in the AXI domain:

  • Core registers are accessed through the AXI slave bridge register translation.
  • DMA channel registers are accessed through the AXI slave DMA register translation(BAR0).

Bridge and DMA registers are accessed over the integrated block for PCIe at fixed offsets in the PCIe BAR region.

  • PCIe access to bridge registers is disabled (by default)
  • PCIe access to DMA channel registers is controlled through bar. This access is enabled by default.

Address Translation

The bridge provides 16 fully-configurable address apertures to support address translation both for ingress (from PCIe to AXI) and egress (from AXI to PCIe) transactions.

  • In an AXI master, up to 16 ingress translation regions can be set up. Translation is done for the PCIe TLPs that are not decoded as MSI or MSI-X interrupts or internal DMA transactions.
  • In an AXI slave, up to 16 translation regions can be set up. Translation is done for AXI transactions destined for PCIe and not PCIe ECAM or any other internal bridge register access.

IMPORTANT: For egress translations, it is important to limit the AXI domain address to the following ranges per the System Address Map

  • 256 MB region starting at 0xA000_0000.
  • 256 GB region starting at 0x1000_0000_0000.

Only when AXI transactions target these ranges are they routed to the controller for PCIe for further translation by the bridge.

A translation is hit when the following occurs:

  • Translation is enabled (tran_enable == 1)
  • The tran_src_base[63:(12+tran_size)] == source address [63:(12+tran_size)]

On a translation hit, the upper source address bits are replaced with destination base address bits before forwarding the transaction to the destination.

Destination address = {tran_dst_base[63:(12+tran_size)] source address[12+tran_size]}

If translation is valid and security_enable==1 then the following occurs:

  • For ingress, ARPROT/AWPROT on AXI is assigned the value from tz_at_ingr[i] associated with the translation.
  • For egress, if ARPROT/AWPROT from AXI matches the security level of tz_at_egr[i] associated with the translation, then the transaction is forwarded to the PCIe interface. Otherwise, it is discarded with a SLVERR response on AXI.

IMPORTANT: The security values for translation (tz_at_ingr/egr) are programmed at boot time as part of the SLCR_PCIE register under the MMI_SLCR_SECURE register set.

Note: The source/destination address programmed should be aligned to the translation aperture size. For a 64 KB aperture size, the lower 16 bits of the source/destination addresses must be zeros.

If multiple translation hits occur, the translation with the lowest index (lowest translation register address offset for the ingress/egress direction) is used for the transaction.

When operating as an Endpoint, the PCIe BARs are setup by the host PC during enumeration and ingress translations required for PCIe-to-AXI translations are set up by the AXI CPU.

Enhanced Configuration Access Mechanism

The bridge implements ECAM to translate AXI read or write transactions to PCIe configuration read or write TLPs. ECAM maps a portion of the AXI memory address space to the PCI Express configuration transactions. A write transaction targeting this region is converted into a PCI Express configuration write transaction and a read transaction targeting this region is converted into a PCI Express configuration read transaction.

The ECAM region is hit when the following occurs:

  • ECAM is enabled (ecam_enable == 1)
  • The ecam_base[63:(12+ecam_size)] == AXI address[63:(12+ecam_size)]

On an ECAM hit, the lower AXI address bits are mapped into the PCI Express configuration transaction as listed in the following table:

Table 5. AXI Address to PCIe Configuration TLP Mapping
AXI Address Bits #uva1726874244780__ol_nqn_knp_nfc1 PCIe Configuration TLP Field Notes
AXI address [27:20] Bus number [7:0] If ecam_size is set to less than 256 MB, then the upper bus number bits that are not controlled by the AXI address are set to 0
AXI address [19:12] AXI address[19:15] = Device Number[4:0]

For PCI Express devices, implementing an alternative routing ID (ARI).

AXI Address[19:12] = Function Number[7:0]

AXI address [14:12] Function number [2:0]  
AXI address [11:2] Configuration register DWORD address [11:2]  
  1. AXI address [1:0] along with AXI transaction size are used to compute the transaction byte enables.

ECAM transactions are not permitted to cross a DWORD address boundary. If a transaction hit to the ECAM region crosses a DWORD address boundary or times out, the transaction is aborted with DECERR status.

Note: The bridge generates SLVERR for ECAM transactions when the link is down. Software is required to check for link up status before sending ECAM transactions. The exception to this is during access of the local root configuration space (bus number = 0) when the PCIe controller is used as the Root Port.

Generation of Type-0 or Type-1 Configuration Transactions

Type-0 or type-1 configuration transactions are generated when operating as Root Port to enumerate the PCIe hierarchy. The following summarizes what happens when a type-0 or a type-1 configuration transaction is generated. The bus, device and function number terminology used in the following description is extracted from the incoming AXI address hitting the ECAM aperture.

  • When the bus number in the ECAM address == PCIe core bus number
    • For device number = 0 and function number = 0, an internal configuration access is generated for the integrated block for PCIe.
    • If either device number or function number is non-zero, the transaction is ended with a DECERR.
  • If the target bus number in an ECAM address == secondary bus number programmed through CSR module.
    • For device number = 0, a type-0 configuration TLP is transmitted.
    • For a non-zero device number, the transaction is ended with DECERR.
  • When an ECAM address targets a bus number that is different from the other options, a type-1 configuration TLP is transmitted.

Root Port Received Interrupt and Message Controller

A received interrupt and message controller collects interrupts and messages received from the PCIe hierarchy. Interrupt reception is applicable only to Root Port mode.

The following interrupt output ports are provided and connected to the AXI CPU (PS generic interrupt controller, GIC, in this case):

  • Two interrupt ports for MSI; each interrupt output provides interrupt for 32 vectors
  • One interrupt port for legacy interrupts
  • One interrupt port for DMA

Interrupts

Interrupt generation capability is provided to both the PCIe bus and the system interrupt controllers (see Chapter 13, Interrupts). This section describes the various interrupts.

PCIe Bus Interface Interrupts

As an Endpoint, the controller supports, legacy, MSI (multi-vector up to four) and MSI-X (up to four vectors) interrupt generation. These interrupts (when enabled) are generated by DMA transactions due to the completion of a DMA transfer or due to an error event.

In Endpoint mode, the bridge optionally generates interrupts

When MSI-X is enabled, the bridge implements an MSI-X table and PBA at fixed offset with regards to cfg_dma_reg_bar.

Note: As an Endpoint, when legacy interrupts are used, only INTA is supported.

IMPORTANT: As a Root Port, if an Endpoint sends a non-compliant MSI TLP, it will be dropped. It is required for the first byte-enable field in the MSI TLP to be equal to all ones.

System Interrupts

System interrupts can be generated when the controller for PCIe is used either as a Root Port or as an Endpoint. The five PCIe system interrupts are (MSI0, MSI1, INT{A, B, C, D}, DMA, and MSC).

As an Endpoint, the following interrupts can be generated to the system controller:

  • DMA interrupts due to completion or an error when enabled; these are generated when the DMA operation is enabled.
  • The host software can create an interrupt in the AXI domain since the PCIe protocol does not support interrupts downstream.
  • Host software controlled interrupts provided per DMA channel which can be used for handshake purposes. Note that PCIe protocol doesn’t support interrupts downstream so this provides a means for the host (Root Port) interrupting the processor on the Endpoint.

Ingress Transactions

Refer to the following table for the PCIe-AXI transaction mapping:

Table 6. Ingress Transaction Map Table
PCIe Transaction AXI Transaction Map Condition
Memory read TLP AXI read on AXI master port

Translated address if ingress translation is hit.

If translation is not hit, and subtractive decode is enabled, forward to AXI without translation. Otherwise, unsupported request.

Memory write TLP AXI write on AXI master port Translated address if ingress translation is hit. Otherwise, the same address (no translation) on subtractive decode.
Configuration TLP Handled internally by the integrated block for PCIe.
Successful Cpl (CfgWr response) AXI response OKAY  
Cpl with unsupported request (CfgWr response) AXI response DECERR.  
Cpl with unsupported request (CfgRd response)

AXI response DECERR if cfg_rd_ur_is_ur_ok1s_n = 1.

AXI response OKAY with data as all

1's when cfg_rd_ur_is_ur_ok1s_n = 0.

 
Cpl with completer abort AXI response SLVERR

Egress Transactions

ECAM write and I/O write transactions are non-posted in the PCIe domain. Non-posted transactions must not be allowed to stall posted transactions to avoid deadlock conditions. These non-posted writes require arbitration for a PCIe tag and completion handling resources managed by the bridge's reorder queue. These non-posted writes are not queued in the reorder queue and instead are queued into an additional non-posted write FIFO.

Refer to the following table for the AXI transaction mapping to PCIe domain:

Table 7. Egress Transaction Map Table
AXI Transaction PCIe Transaction Notes
AXI read transaction

Local bridge register read if BREG aperture is hit.

DMA register read if DREG aperture is hit.

Configuration read TLP if ECAM aperture is hit.

Memory read TLP if no other aperture is hit.

For memory read TLP, the address is translated if the egress translation is hit.

Otherwise, it remains the same.

AXI write transaction

Local bridge register write if BREG aperture is hit.

DMA register write if DREG aperture is hit.

Configuration write TLP if ECAM aperture is hit.

Memory write TLP if no other aperture is hit.

For memory write TLP, the address is translated if the egress translation is hit.

Otherwise, it remains the same.

Note: Special handling is required by software for memory write transactions with ECRC errors. The PCI Express specification mandates that ECRC errors be captured and signaled to the software. The error could be in the payload or in the header. If the payload is corrupted, a known location could receive incorrect data. If the address is corrupted, the transaction could end up at a completely incorrect slave. Software is required to read the header from the AER registers in the PCIe configuration space and take corrective action because, by the time software receives notification of such an event, the write transaction with the ECRC error could already be executed.

Endpoint Compliance

When PCIe Configuration Space Tests (PCIECV) are performed for PCI-SIG(R) compliance, the Endpoint drivers on a host system are not installed. Likewise, when using the Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 as an Endpoint for a PCIECV test, any driver accessing AXI-PCIe bridge registers running on the Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 (APU or RPU clusters) should not be installed.

If the driver running on the Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2accesses the AXI-PCIe bridge registers, it can cause the transaction pending bit (in PCIe configuration space) to be set, which would cause a PCIECV compliance failure.

PCIe Configuration Space

The configuration space is a register space defined by each revision of the PCI Express Base Specification. The PCIe controller supports up to 8 Physical Functions (PFs) and 64 Virtual Functions (VFs).

The PCI configuration space consists of the following primary sections:

Legacy PCI Type 0/1 Configuration Space Header

  • Type 0 Configuration Space Header supported for Endpoint configuration
  • Type 1 Configuration Space Header supported for Root Port configuration

Configuration Space Capabilities

  • PCIe capability
  • Power management capability
  • Message signaled interrupt (MSI) capability
  • MSI-X capability