Architecture Overview
The features of the 10G Ethernet MAC are as follows:
- 1G/2.5G/5G/10G one lane ethernet interface with 128b AXI channel to FPD SMMU/CMN in the PS
- Integrated 1000BASE-X and USXGMII PCS modules
- A high-performance DMA with advanced AXI offloading capabilities and descriptor caching
- Four priority queues
- QoS, and IEEE 1588
- Time Sensitive Networking/Audio-Video Bridging (TSN/AVB)
- PTP
- Support for jumbo frames (max size 10K)
- IEEE 802.3az Energy-Efficient Ethernet (EEE)
- VLAN
- Advanced TCP/IP Offload
- Programmable Inter Packet GAP (IPG) Stretch
- MDIO interface for physical layer management of an external PHY device
- IEEE 802.1CB
- IEEE 802.1 Qci Receive Traffic Policing
- IEEE 802.3bz/802.3cb 2.5G support
Functional Description
| Clock | Frequency Req from Spec (MHz) | Targeted Freq | Source | Description |
|---|---|---|---|---|
| ACLK | 150 – 400 | 300 MHz | Sysmon_clk | AMBA AXI clock used by DMA block |
| PCLK | 150 – 400 | 150 MHz | lsbus_clk | AMBA APB clock used by configuration |
| TXCLK |
125(1G), 312.5(2.5G), 78.125(5G), 156.25(10G) |
156.25 refclk to GTYPLL for 5G/10G TX/RXOUTCLK for 1G/2.5G |
MAC transmit clock Frequency must locked to gtx20_clk for 5G and 10G |
|
| RXCLK |
62.5 (1G), 156.25(2.5G), 80.56(5G),161.1328(10G) |
RXOUTCLK |
MAC receive clock For USXGMII, used over sampling clock: 80-100 MHz for 5G 160-200 MHz for 10G |
|
| N_TX_CLK |
125(1G), 312.5(2.5G), 78.125(5G), 156.25(10G) |
GPU PLL |
Inverted tx_clk used for loopback module. For loopback, tx_clk domain signals are re-timed to this clock before being passed to the rx_clk domain receiver inputs. For internal loopback operation, this clock must be same frequency as tx/rx_clk |
|
| PCS_RX_CLK |
62.5 (1G), 156.25(2.5G), 80.56(5G),161.1328(10G) |
RXOUTCLK | Clock used in the PCS receive | |
| GTX20_CLK |
62.5 (1G), 156.25(2.5G), 80.56(5G),161.1328(10G) |
TXOUTCLK | PCS transmit clock used at the PCS to PHY interface. | |
| GTX_CLK | 125 (1G), 312.5 (2.5G) | TXOUTCLK |
PCS transmit clock. Used by 8b10b PCS. This clock is not used by 5G or 10G |
|
| TSU_CLK | 100 | 100 | TSU_CLK from PS/ Fabric |
Alternate clock source for TSU. Frequency should be > than tx_clk or rx_clk for High speed mode. Otherwise, it should be > than 1/8 of the tx/rx_clk |
Refer to the following figure for the 10GbE controller block diagram.
- See the MIO-at-a-Glance Tables for MIO pins locations that support the MDIO signals to the external PHY.
The following sections contain detailed description of the functional blocks.
MAC Transmitter Block
The MAC transmitter operates in full duplex and transmits frames in accordance with the Ethernet IEEE 802.3 standard.
A small input buffer receives data through the external FIFO interface (from either the DMA module or external to the IP Core) which, depending on the dma_bus_width control bits in the network configuration register, will extract data in either 32, 64, or 128-bit form. All subsequent processing prior to the final output is performed in bytes.
Transmit data can be output using the GMII/MII interface or through the TBI. If the TBI is selected (gigabit and SGMII modes only), then the MAC transmitter passes 8-bit data to the PCS for further processing prior to output on the TBI. In 10 and 100Mbps SGMII mode, the MAC is clocked slower than the PCS which has the effect of the PCS sampling the same 8-bit data 10 or 100 times.
Frame assembly starts by adding preamble and the start frame delimiter. Data is taken from the transmit FIFO interface a word at a time. When 10GbE is configured for gigabit operation, the data output to the PHY uses all 8 bits of the output. In 10/100M mode, transmit data to the PHY is nibble wide and least significant nibble first using txd [3:0] with txd [7:4] tied to logic 0. If necessary, padding is added to take the frame length to 60 bytes. CRC is calculated using an order 32-bit polynomial. This is inverted and appended to the end of the frame taking the frame length to a minimum of 64 bytes.
If the no CRC bit is set in the second word of the last buffer descriptor of a transmit frame, neither pad nor CRC are appended. The no CRC bit can also be set through the FIFO interface.
In full-duplex mode (at all data rates), frames are transmitted immediately. Back to back frames are transmitted at least 96 bit times apart to guarantee the interframe gap.
By setting when bit 28 is set in the network configuration register, the Inter Packet Gap (IPG) may be stretched beyond 96 bits depending on the length of the previously transmitted frame and the value written to the IPG_STRETCH register. The least significant 8 bits of the IPG_STRETCH register multiply the previous frame length (including preamble) the next significant 8 bits (+1 so as not to get a divide by zero) divide the frame length to generate the IPG. IPG stretch only works in full-duplex mode and when bit 28 is set in the network configuration register. The IPG_STRETCH register cannot be used to shrink the IPG below 96 bits.
MAC Receive Block
All processing within the MAC receive block is implemented using 16-bit datapath. The MAC receive block checks for valid preamble, FCS, alignment and length, presents received frames to the external FIFO interface (to either the DMA module or external to the IP core) and stores the frames destination address for use by the address checking block.
If, during frame reception, the frame is found to be too long, a bad frame indication is sent to the FIFO. The receiver logic ceases to send data to memory as soon as this condition occurs. At end of frame reception the receive unit indicates to the DMA controller whether the frame is good or bad. The DMA controller recovers the current receive buffer if the frame is bad.
Ethernet frames are normally stored in memory via the DMA unit or to the FIFO complete with the FCS. Setting the FCS remove bit in the network configuration register (bit [17]) causes frames to be stored without their corresponding FCS. The reported frame length field is reduced by four bytes to reflect this operation.
The receive block signals to the register block to increment the alignment, CRC (FCS), short frame, long frame, jabber or receive symbol errors when any of these exception conditions occur. If bit [26] is set in the network configuration CRC, errors are ignored and frames with CRC errors are not discarded, though the frame check sequence errors statistic register is still incremented. Bit [13] of the receiver descriptor word [1] is updated to indicate the FCS validity for the particular frame. This is useful for applications where individual frames with FCS errors must be identified. Received frames can be checked for length field error by setting the length field error frame discard bit of the network configuration register bit [16]. When this bit is set, the receiver compares a frame's measured length with the length field (bytes 13 and 14) extracted from the frame.
The frame is discarded if the measured length is shorter. This checking procedure is for received frames between 64 bytes and 1,518 bytes in length. 1,536 bytes if bit [8] is set in the network configuration register, 10,240 bytes if bit [3] is set in the network configuration register. Each discarded frame is counted in the 10-bit length field error statistics register.
MAC Filtering Block
The MAC filter determines which frames should be written to the AXI interface FIFO and onto the DMA controller. Whether a frame is passed depends on what is enabled in the network configuration register, the state of the I/O matching signals, the contents of the specific address, type, and hash registers and the frame's destination address and type field.
Ethernet frames are transmitted a byte at a time, least significant bit first. The first six bytes (48 bits) of an Ethernet frame make up the destination address. The first bit of the destination address, which is the LSB of the first byte of the frame, is the group or individual bit. This is one for multicast addresses and zero for unicast. The all ones address is the broadcast address and a special case of multicast.
- Specific address register bottom − Stores the first four bytes of the compared source or destination address.
- Specific address register top − Contains the last two bytes of this address, a control bit to select between source or destination address filtering and a 6-bit byte mask field to allow the user to mask bytes during the comparison.
The destination address of received frames is compared against the data stored in the specific address registers once activated. The addresses are deactivated at reset or when their corresponding specific address register bottom is written. They are activated when the specific address register top is written. If a receive frame address matches an active address, the frame is written to the FIFO and on to the DMA controller, if used.
Frames can be filtered using the type ID field for matching. Four type ID registers exist in the register address space and each can be enabled for matching by writing a one to the MSBs (bit [31]) of the respective register. When a frame is received, the matching is implemented as an OR function of the various types of match.
The contents of each type ID registers (when enabled) are compared against the length/type ID of the frame being received (for example, bytes 13 and 14 in non-VLAN and non-SNAP encapsulated frames) and copied to memory if a match is found. The encoded type ID match bits (word 1, bit [22] and bit [23]) in the receive buffer descriptor status are set indicating which type ID register generated the match, if the receive checksum offload is disabled. The reset state of the type ID registers is zero, for this reason, each is initially disabled.
Physical Coding Sub Layer PCS and PHY Interface
The 10G Ethernet MAC (10GbE) supports speed ranging from 10Mb/s to 10Gb/s. The PHY interface used is USXGMII/SGMII and the RAW data mode inside PS-GTYP PHY is used. The 10GbE contains two MACs, one supporting 1G/2.5G functionality with data width of 20 and a second MAC supporting 5G/10G with a data width of 64. Each MAC has a corresponding physical coding sublayer (PCS), which in turn includes 8b/10b or 64b/66b encoding. The MACs connect to its respective PCS through an internal XGMII interface. Only one of the PCS will be enable at any given time. In addition, because encoding functionality is included, PG-GTYP PCS must be bypassed. The 10GbE Core interfaces to the PG-GTYP through a Serdes Interface which has a maximum data width of 64-bits.
Direct Memory Access DMA
The DMA is implemented using a packet buffering mode using external SRAM memories.The DMA block connects to external memory through its AMBA AXI bus interface. Datapath bus width of 128 bit supported at all data rates. DMA uses a Scatter Gather method for Data transactions. The DMA gathers data to be transmitted from transmit data buffers in system memory and scatters received data to receive data buffers in system memory. Receive or transmit frames are stored in one or more DMA buffers. The receive buffer size is programmable between 64 bytes and 16KB. Transmit buffers range in length between 0 and 16380 bytes, and up to 128 buffers are permitted per frame. The DMA block manages the transmit and receive frame buffer queues.
Time Stamp Unit TSU
- The 48 upper bits count seconds
- The next 30 lower bits count nanoseconds
- The lowest 24 bits count sub nanoseconds
- The 54 lower bits roll over when they have counted to one second
The timer increments by a programmable period (to approximately 59.6 attosecond resolution – 5.96E-17 with each PCLK or TSU_CLK period and can also be adjusted in 1ns resolution (incremented or decremented) through register accesses.
PL External FIFO
To enable user applications that require direct access to the Ethernet packets, a PL interface to the External FIFO port on the 10GbE controller is required. The 10GbE IP allows both the DMA and External FIFO interface to co-exist but allows run time selection through configuring GEM registers.
Management Data Input/Output (MDIO)
An MDIO interface is provided to allow the 10GbE to access the external PHY’s management registers. This interface is controlled by the PHY management register. Writing to this register causes a PHY management frame to be sent to the PHY over the MDIO interface. PHY management frames are used to either write or read the PHY’s control and status. Enabling MDIO functions requires the use of three MIO pins. THE MDIO interface from the 10GbE shall be routed to three MIO pins in the PSXC.
For system designers using an external physical PHY for MMI 10G Ethernet (instead of an SFP module), note that there is no dedicated MDIO control. You must use the existing MDIO interface from GEMn via PMC_MIO[50:51] and LPD_MIO[24:25] or LPD_MIO[21:22] on the PMC/LPD MIO banks to access the external PHY control registers. The board schematic should reflect this MIO usage. See the PMC MIO Pin Tables and LPD MIO Pin Table.
Software modifications may be necessary at the application level to enable communication with the external PHY over these MDIO lines. The existing Linux Ethernet drivers support MDIO line sharing among multiple LPD GEMs. This same mechanism is used to share LPD GEMn MDIO with the MMI 10G interface for external PHY access.
The shared MDIO configuration is typically defined in the device tree, which is used by the software to implement the common MDIO solution. AMD provides device tree files for public evaluation boards, while customers must create their own based on their hardware. Documentation and an example for common MDIO configuration are available here: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841740/Macb+Driver#MacbDriver-CommonMDIODT.
Interfaces
Below is a list of interfaces that will be implemented in the 10GbE IP:
| Interfaces | Connecting to | Description |
|---|---|---|
| APB | Int_Wrap | Used for TSU and configuration |
| AXI4 | Int_Wrap |
128 bits AXI-4 interface This is used for DMA communication |
| MDIO | PS | This is a 4 pins interface used for 10GbE to communicate to external PHY |
| TX/RX External FIFO | PL | This is a FIFO interface that can be used for debugging the data from the MAC |
| TSU | PS->PL | IEEE 1588 PTP frame recognition |
| TX/RX DPRAM | Memory in MMI | Buffers for DMA |
| USXGMII/SGMII | PS-GTYP |
Data and control to/from PHY Data bus is 64 bits wide For 1G/2.5G, it will only use the lower 20 bits (upper bits will be packed to 0) |
| Control/Status | PS/SCLR | Status outputs or control inputs |
| Interrupts | Int_Wrap | Interrupts |