10 GbE is supplied with an optional DMA interface. When 10 GbE is configured to use the DMA, it is attached to the MAC module’s external FIFO interfaces to provide a scatter gather type capability for packet data storage in an embedded processor system or system on chip (SoC)
In the high-speed connectivity module, the 10 GbE is always set to packet buffering mode which uses the external Dual-port SRAM for DMA storage. The DMA transfer uses AXI4 interface with 128-bits data bus and 64-bit address bus.
DMA Features:
DMA Endianism
The default configuration of the DMA is to use little endian format.
And may be programmed to swap the endianism using bits 6 and 7 of the DMA
configuration register (0x0010). Bit 6 controls
the endianism of the management operations and bit 7 controls the endianism of the
data operations.
DMA Transactions
The DMA uses separate transmit and receive lists of buffer descriptors, with each descriptor describing a buffer area in memory. This can allow Ethernet packets to be broken up and scattered around the system memory (multi-buffer operation), although one buffer per frame is also permitted. The arbitration scheme that is deployed in GEM is least-recently-granted and fixed priority.
Transfer size can be programmed to be 32, 64, or 128-bit words using the DMA bus width select bits in the network configuration register and burst size may be programmed to single access or bursts of 4, 8, 16, or 256 words using the DMA configuration register. 256-word bursts are not supported with an AXI3 interface. The DMA adheres to AXI4 specification of not exceeding 4 KB address boundary per transaction. If occurs, the DMA will separate out into two bursts. The maximum length for a jumbo frame is 16,320 bytes due to the limitation in the length field of the DMA descriptor.
When the DMA is configured to use SRAM-based packet buffers, it can be programmed into a low-latency mode, known as partial store and forward. Enabled via the TX and RX partial store and forward programmable registers. When the transmit partial store and forward mode is activated, the transmitter will only begin to forward the packet to the MAC when there is enough packet data stored in the packet buffer. Similarly, when the receive partial store and forward mode is activated, the receiver will only begin to forward the packet to the external AXI slave when enough packet data is stored in the packet buffer.
Receive DMA Buffers
Received frames, optionally including FCS, are written to receive buffers stored in AXI memory. The start location for each receive buffer is stored in memory in a list of receive buffer descriptors at an address location pointed to by the receive buffer queue pointer. The base address for the receive buffer queue pointer is configured in software using the receive buffer queue base address register(s).
The number of words in each buffer descriptor (BD) is dependent on the operating mode. Each BD word is defined as 32-bits. The first two words (word 0 and word 1) are used for all BD modes.
In extended buffer descriptor modes (DMA configuration register bit 28 = 1), two buffer descriptor words are added for 64-bit addressing mode and two buffer descriptor words are added for time-stamp capture. Therefore, there are either two, four, or six buffer descriptor words in each buffer descriptor entry depending on operating mode, and every buffer descriptor entry has the same number of words.
- Every descriptor is 64-bits wide when 64-bit addressing is disabled and the descriptor time-stamp capture mode is disabled.
- Every descriptor is 128-bits wide when 64-bit addressing is enabled and the descriptor time-stamp capture mode is disabled.
- Every descriptor is 128-bits wide when 64-bit addressing is disabled and the descriptor time-stamp capture mode is enabled.
- Every descriptor is 192-bits wide when 64-bit addressing is enabled and the descriptor time-stamp capture mode is enabled.
| Bit | Function |
|---|---|
| Word 0 | |
| 31:3 | Address [31:3] of beginning of buffer |
| 2 |
Address [2] of beginning of buffer. Or In Extended Buffer Descriptor Mode (DMA configuration register[28] = 1), indicates a valid timestamp in the BD entry |
| 1 | Wrap - marks last descriptor in receive buffer descriptor list. |
| 0 |
Ownership - needs to be zero for GEM to write data to the receive buffer. GEMsets this to 1 once it has successfully written a frame to memory. Software has to clear this bit before the buffer can be used again. |
| Word 1 | |
| 31 | Global all ones broadcast address detected. |
| 30 | Multicast hash match. |
| 29 | Unicast hash match. |
| 28 |
External address match. Note if the packet buffer mode and the number of configured specific address filters is greater than four in gem_gxl_defs.v then external address matching is not reported in this bit and instead it is set if there has been a match in the first eight specific address registers. Bit 27 is then used along with bits 26:25 to indicate which register matched. |
| 27 | Specific address register match found, bit 25 and bit 26 indicates which specific address register causes the match. See note for bit 28 above. |
| 26:25 |
Specific address register match. Encoded as follows: • 00 - Specific address register 1 match • 01 - Specific address register 2 match • 10 - Specific address register 3 match • 11 - Specific address register 4 match If more than one specific address is matched only one is indicated with priority 4 down to 1. |
| 24 |
This bit has a different meaning depending on whether RX checksum offloading is enabled. With RX checksum offloading disabled: (bit 24 clear in Network Configuration) Type ID register match found, bit 22 and bit 23 indicate which type ID register causes the match. With RX checksum offloading enabled: (bit 24 set in Network Configuration) 0 - the frame was not SNAP encoded and/or had a VLAN tag with the CFI bit set. 1 - the frame was SNAP encoded and had either no VLAN tag or a VLAN tag with the CFI bit not set. |
| 23:22 |
This bit has a different meaning depending on whether RX checksum offloading is enabled. With RX checksum offloading disabled: (bit 24 clear in Network Configuration) Type ID register match. Encoded as follows: • 00 - Type ID register 1 match • 01 - Type ID register 2 match • 10 - Type ID register 3 match • 11 - Type ID register 4 match If more than one Type ID is matched only one is indicated with priority 4 down to 1. With RX checksum offloading enabled: (bit 24 set in Network Configuration) • 00 - Neither the IP header checksum nor the TCP/ UDP checksum was checked. • 01 - The IP header checksum was checked and was correct. Neither the TCP nor UDP checksum was checked. • 10 - Both the IP header and TCP checksum were checked and were correct. • 11 - Both the IP header and UDP checksum were checked and were correct. |
| 21 |
VLAN tag detected — type ID of 0x8100. For packets incorporating the stacked VLAN processing feature, this bit will be set if the second VLAN tag received has a type ID of 0x8100 |
| 20 |
Priority tag detected — type ID of 0x8100 and null VLAN identifier. For packets incorporating the stacked VLAN processing feature, this bit will be set if the second VLAN tag received has a type ID of 0x8100 and a null VLAN identifier. |
| 19:17 |
When bit 15 (End of frame) and bit 21 (VLAN tag) are set, these bits represent the VLAN priority. When header/data splitting is enabled (via bit 5 of the DMA configuration register, offset 0x10) bit 17 indicates this descriptor is pointing to the last buffer of the header. |
| 16 |
This bit has a different meaning depending on the state of bit 13 (report bad FCS in bit 16 of word 1 of the receive buffer descriptor) and bit 5 (header/data splitting) of the DMA Configuration register (offset 0x10). When header/data splitting is enabled and this buffer descriptor (BD) is not the last BD of the frame (as indicated in bit 15 of this BD), this bit will indicate that the BD is pointing to a data buffer containing header bytes. When this BD is the last BD of the frame (as indicated in bit 15 of this BD), and bit 13 of the DMA configuration register is set, this bit represents FCS/CRC error. When this BD is the last BD of the frame (as indicated in bit 15 of this BD), and bit 13 of the DMA configuration register is clear, and the received frame is VLAN tagged, this bit represents the Canonical format indicator (CFI). |
| 15 |
End of frame - when set the buffer contains the end of a frame. If end of frame is not set, then the only valid status bit (unless header/data splitting is enabled) is start of frame (bit 14). If header/data splitting is enabled, then bits 16 and 17 are also valid status bits when this bit is not set. |
| 14 |
Start of frame - when set the buffer contains the start of a frame. If both bits 15 and 14 are set, the buffer contains a whole frame. |
| 13 |
This bit has a different meaning depending on whether jumbo frames and ignore FCS mode are enabled. If neither mode is enabled this bit will be zero. With jumbo frame mode enabled: (bit 3 set in Network Configuration Register) or RSC enabled Additional bit for length of frame (bit[13]), that is concatenated with bits[12:0] With ignore FCS mode enabled and jumbo frames disabled: (bit 26 set in Network Configuration Register and bit 3 clear in Network Configuration Register) This indicates per frame FCS status as follows: • 0 – Frame had good FCS • 1 – Frame had bad FCS, but was copied to memory as ignore FCS enabled |
| 12:0 |
When header/data splitting enabled (via bit 5 of the DMA configuration register, offset 0x10) and bit 17 is set (last buffer of header), these bits represent the length of the header in bytes. When bit 15 (End of frame) is set, these bits represent the length of the received frame which may or may not include FCS depending on whether FCS remove mode is enabled. With FCS discard mode disabled: (bit 17 clear in Network Configuration Register) Least significant 13-bits for length of frame including FCS. If jumbo frames are enabled, these 12-bits are concatenated with bit[13] of the descriptor above. With FCS discard mode enabled: (bit 17 set in Network Configuration Register) Least significant 13-bits for length of frame excluding FCS. If jumbo frames are enabled, these 12-bits are concatenated with bit[13] of the descriptor above. |
| Bit | Function |
|---|---|
| Word 2 (64-bit addressing) | |
| 31:0 | Upper 32-bit address of Data Buffer |
| Word 3 (64-bit addressing) | |
| 31:0 | Unused |
When Descriptor Timestamp Capture mode is enabled, the table below identifies the added descriptor words:
| Bit | Function |
|---|---|
| Word 2 (32-bit addressing) or Word 4 (64-bit addressing) | |
| 31:30 | Timestamp seconds [1:0] |
| 29:0 | Timestamp nanosecs [29:0] |
| Word 3 (32-bit addressing) or Word 5 (64-bit addressing) | |
| 31:10 | Unused |
| 9:0 | Timestamp seconds [11:2] |
|
|
Transmit DMA Buffers
Frames to transmit are stored in one or more transmit AXI buffers. Zero length AXI buffers are allowed, and the maximum number of buffers permitted for each transmit frame is 128.
The start location for each transmit buffer is stored in AXI memory in a list of transmit buffer descriptors at a location pointed to by the transmit buffer queue pointer. The base address for this queue pointer is set in software using the transmit buffer queue base address register(s).
The number of words in each buffer descriptor depends on the operating mode. The first two words (word 0 and word 1) are used for all buffer descriptor modes. In extended buffer descriptor mode, two buffer descriptor words are added for 64-bit addressing mode and two buffer descriptor words are added for timestamp capture. Therefore, there are either two, four, or six buffer descriptor words in each buffer descriptor entry depending on operating mode, and every buffer descriptor entry has the same number of words.
In transmit Extended Buffer Descriptor Modes (bit 29 in the DMA configuration register), two BD words are added for 64 bit addressing mode and two BD words are added for timestamp capture. There are therefore either two, four or six BD words in each BD entry depending on operating mode, and every BD entry will have the same number of words.
-
Every descriptor will be 64-bit wide when 64-bit addressing is disabled and extended buffer descriptor mode is disabled
-
Every descriptor will be 128-bit wide when 64-bit addressing is enabled and extended buffer descriptor mode is disabled
-
Every descriptor will be 128-bit wide when 64-bit addressing is disabled and extended buffer descriptor mode is enabled
-
Every descriptor will be 192-bit wide when 64-bit addressing is enabled and extended buffer descriptor mode is enabled
Table 4. Transmit Buffer Descriptor Entry Table - Non-LSO Frame Bit Function Word 0 31:0 Byte address of buffer Word 1 31 Used – must be zero for GEM to read data to the transmit buffer.
GEM sets this to one for the first buffer of a frame once it has been successfully transmitted. Software must clear this bit before the buffer can be used again.
30 Wrap – marks last descriptor in transmit buffer descriptor list. This can be set for any buffer within the frame.
29 Retry limit exceeded, transmit error detected 28 Transmit underrun. Occurs when the start of packet data has been written into the FIFO and either hresp is not OK, or the transmit data could not be fetched in time, or when buffers are exhausted. This is not set when the DMA is configured for packet buffer mode.
27 Transmit frame corruption due to AHB or AXI error – set if an error occurs whilst midway through reading transmit frame from the AHB/AXI, including HRESP or RRESP/BRESP errors and buffers exhausted mid frame (if the buffers run out during transmission of a frame then transmission stops, FCS shall be bad and tx_er asserted).
Also set in AHB (not AXI) DMA packet buffer mode if single frame is too large for configured packet buffer memory size.
26 Late collision, transmit error detected.
Late collisions only force this status bit to be set in gigabit mode.
25:24 Reserved. 23 For Extended Buffer Descriptor Mode this bit Indicates a timestamp has been captured in the BD. Otherwise Reserved. 22:20 Transmit IP/TCP/UDP checksum generation offload errors:
• 000 - No Error
• 001 - The Packet was identified as a VLAN type, but the header was not fully complete, or had an error in it
• 010 - The Packet was identified as a SNAP type, but the header was not fully complete, or had an error in it
• 011 - The Packet was not of an IP type, or the IP packet was invalidly short, or the IP was not of type IPv4/IPv6
• 100 - The Packet was not identified as VLAN, SNAP or IP
• 101 - Non supported packet fragmentation occurred. For IPv4 packets, the IP checksum was generated and inserted
• 110 - Packet type detected was not TCP or UDP. TCP/UDP checksum was therefore not generated. For IPv4 packets, the IP checksum was generated and inserted
• 111 - A premature end of packet was detected and the TCP/UDP checksum could not be generated
19:17 Reserved.
Must be set to 3’b000 to disable TSO and UFO
16 No CRC to be appended by MAC.
When set this implies that the data in the buffers already contains a valid CRC and hence no CRC or padding is to be appended to the current frame by the MAC.
This control bit must be set for the first buffer in a frame and will be ignored for the subsequent buffers of a frame. This operation is different from Cadence’s Ethernet MAC 10/100 (Enhanced), which reads the no CRC bit from the final buffer descriptor in the frame.
Note that this bit must be clear when using the transmit IP/TCP/UDP checksum generation offload, otherwise checksum generation and substitution will not occur.
Note: This bit must also be cleared when TX Partial Store and Forward mode is active.
15 Last buffer.
When set, this bit will indicate the last buffer in the current frame has been reached.
14 Reserved. 13:0 Length of buffer.
| Bit | Function |
|---|---|
| Word 0 | |
| 31:0 | Byte address of buffer |
| Word 1 | |
| 31 |
Used – must be zero for 10GbE to read data to the transmit buffer. GEM sets this to one for the first buffer of a frame once it has been successfully transmitted. Software must clear this bit before the buffer can be used again. |
| 30 |
Wrap – marks last descriptor in transmit buffer descriptor list. This can be set for any buffer within the frame. |
| 29 | Retry limit exceeded, transmit error detected |
| 28 | Transmit under run – always 0 for TSO |
| 27 |
Transmit frame corruption due to AHB or AXI error. Set if an error occurs whilst midway through reading transmit frame from the AHB/AXI, including HRESP or RRESP/BRESP errors and buffers exhausted mid frame (if the buffers run out during transmission of a frame then transmission stops, FCS shall be bad and tx_er asserted). Also set in DMA packet buffer mode if single frame is too large for configured packet buffer memory size. |
| 26 |
Late collision, transmit error detected. Late collisions only force this status bit to be set in gigabit mode. |
| 25:24 |
TCP Stream Identifier. Used to select the hardware counter that is used for TCP sequence number generation. |
| 23 |
For Extended Buffer Descriptor Mode this bit Indicates a timestamp has been captured in the BD. Otherwise Reserved. |
| 22:20 |
Transmit IP/TCP/UDP checksum generation offload errors: • 000 - No Error • 001 - The Packet was identified as a VLAN type, but the header was not fully complete, or had an error in it • 010 - The Packet was identified as a SNAP type, but the header was not fully complete, or had an error in it • 011 - The Packet was not of an IP type, or the IP packet was invalidly short, or the IP was not of type IPv4/IPv6 • 100 - The Packet was not identified as VLAN, SNAP or IP • 101 - Non supported packet fragmentation occurred. For IPv4 packets, the IP checksum was generated and inserted • 110 Packet type detected was not TCP or UDP. TCP/ UDP checksum was therefore not generated. For IPv4 packets, the IP checksum was generated and inserted • 111 - A premature end of packet was detected and the TCP/UDP checksum could not be generated |
| 19 |
TCP Sequence Number Source Select • 0 – Use sequence number value from the header buffer for the first small TCP frame and hardware generated sequence number values for subsequent small TCP frames • 1 – Use hardware generated sequence number values for all small TCP frames |
| 18:17 |
LSO Control. Set to 2’b10 or 2’b11 to enable TSO |
| 16 |
No CRC to be appended by MAC. Must be clear for TSO operation |
| 15 |
Last buffer. Must be clear as TSO requires at least one payload buffer |
| 14 | Reserved. |
| 13:0 | Length of buffer. |
| Bit | Function |
|---|---|
| Word 0 | |
| 31:0 | Byte address of buffer. |
| Word 1 | |
| 31 |
Used – must be zero for GEM to read data to the transmit buffer. GEM sets this to one for the first buffer of a frame once it has been successfully transmitted. Software must clear this bit before the buffer can be used again. |
| 30 |
Wrap – marks last descriptor in transmit buffer descriptor list. This can be set for any buffer within the frame. |
| 29:16 |
TCP Maximum Segment Size value in bytes. TSO will use a default value of 536 bytes if the programmed value is zero. |
| 15 |
Last buffer. When set, this bit will indicate the last buffer in the current frame has been reached. |
| 14 | Reserved. |
| 13:0 | Length of buffer. |
| Bit | Function |
|---|---|
| Word 0 | |
| 31:0 | Upper 32-bit address of Data Buffer. |
| Word 1 | |
| 31:0 | Unused |
| Bit | Function |
|---|---|
| Word 2 (32-Bit Addressing) or Word 4 (64-Bit Addressing) | |
| 31:30 | Timestamp seconds [1:0] / launch time (See Note:1) |
| 29:0 | Timestamp nanosecs [29:0] / launch time (See Note:1) |
| Word 3 (32-Bit Addressing) or Word 5 (64-Bit Addressing) | |
| 31 | UTLT (Use Transmit Launch Time) |
| 30:10 | Unused |
| 9:0 | Timestamp seconds [11:2] (See Note:1) (prior to release 1p08f1 this was [5:2]) |
|
Note1: The timestamp is mode is controlled using the tx_bd_control_register. The TX Descriptor Timestamp Insertion mode bits are defined as: • 00: TS insertion disable • 01: TS inserted for PTP Event Frames only • 10: TS inserted for All PTP Frames only • 11: TS insertion for All Frames After transmission the timestamp bits are written back only to the first buffer descriptor. |
|
MAC Loopback
- In MAC internal loopback mode, both transmit and receive clock are sourced from the internal Ethernet reference clocks
- TBI mode must be disabled for internal loopback by setting 10GbE {0:3}.network_config[pcs_select] = 0.
Note: Receive and transmit must be disabled when making the switch into and out of internal loopback because the clocks provided might glitch while switching to the loopback reference clock
Link Status
- Bit 2 of the PCS status register
- Bit 0 of the network status register and
- Bit 9 of the interrupt status register
An interrupt is generated each time the PCS link status changes (i.e. link good or link bad).
Copy All Frames (or Promiscuous Mode)
If the copy all frames bit is set in the network configuration register, then all frames (except those that are too long, too short, have FCS errors, or have rx_er asserted during reception) are copied to memory. Frames with FCS errors are copied if bit [26] is set in the network configuration register.
Display Copy of Pause Frames
Pause frames can be prevented from being written to memory by setting the disable copying of pause frames control bit [23] in the network configuration register. When set, pause frames are not copied to memory regardless of the copy all frames bit, whether a hash match is found, a type ID match is identified, or if a destination address match is found.
Broadcast Address
Frames with the broadcast address of 0xFFFFFFFFFFFF are stored to memory only if the no broadcast bit in the network configuration register is set to zero.
Hash Addressing
The hash address register is 64-bits long and takes up two locations in the memory map. The least significant bits are stored in hash register bottom and the most significant bits in hash register top.
- The unicast hash enable and the multicast hash enable bits in the network configuration register enable the reception of hash matched frames
The destination address is reduced to a 6-bit index into the 6- bit hash register using the
following hash function. The hash function is an XOR of every sixth bit of the destination address
hash_index[05] = da[05] ^ da[11] ^ da[17] ^ da[23] ^ da[29] ^ da[35] ^ da[41] ^ da[47]
hash_index[04] = da[04] ^ da[10] ^ da[16] ^ da[22] ^ da[28] ^ da[34] ^ da[40] ^ da[46]
hash_index[03] = da[03] ^ da[09] ^ da[15] ^ da[21] ^ da[27]^ da[33] ^ da[39] ^ da[45]
hash_index[02] = da[02] ^ da[08] ^ da[14] ^ da[20] ^ da[26] ^ da[32] ^ da[38] ^ da[44]
hash_index[01] = da[01] ^ da[07] ^ da[13] ^ da[19] ^ da[25] ^ da[31] ^ da[37] ^ da[43]
hash_index[00] = da[00] ^ da[06] ^ da[12] ^ da[18] ^ da[24] ^ da[30] ^ da[36] ^ da[42]
- If the hash index points to a bit that is set in the hash register, then the frame will be matched according to whether the frame is multicast or unicast.
- A multicast match will be signaled if the multicast hash enable bit is set, da[0] is logic 1 and the hash index points to a bit set in the hash register.
- A unicast match will be signaled if the unicast hash enable bit is set, da[0] is logic 0 and the hash index points to a bit set in the hash register.
- To receive all multicast frames, the hash register should be set with all ones and the multicast hash enable bit should be set in the network configuration register.
Wake-on-Lan Support
- Magic packets.
- Address resolution protocol (ARP) requests to the device IP address.
- Specific address 1 filter match.
- Multicast hash filter match.
If one of these events occurs, WOL detection is indicated by asserting the wake-up interrupt. These events can be individually enabled through bits[19:16] of the wake-on-LAN register. Also, for WOL detection to occur, the receive enable must be set in the network control register.
The wake-up interrupt is asserted due to multicast filter events, an ARP request, or a specific address 1 match even in the presence of a frame error. For magic-packet events, the frame must be correctly formed and error free.
- Magic packet events are enabled through bit 16 of the Wake-on-LAN register
- The frame's destination address matches specific address 1
- The frame is correctly formed with no errors
- The frame contains at least 6 bytes of 0xFF for synchronization
- There are 16 repetitions of the contents of specific address 1 register immediately following the synchronization
- ARP request events are enabled through bit 17 of the Wake-on-LAN register
- Broadcasts are allowed by bit 5 in the network configuration register
- The frame has a broadcast destination address (bytes 1 to 6)
- The frame has a type ID field of 0x0806 (bytes 13 and 14)
- The frame has an ARP operation field of 0x0001 (bytes 21 and 22)
- The least significant 16 bits of the frame's ARP target protocol address (bytes 41 and 42) match the value programmed in bits[15:0] of the Wake-on-LAN register
The decoding of the ARP fields adjusts automatically if a VLAN tag is detected within the frame. The reserved value of 0x0000 for the Wake-on-LAN target address value will not cause an ARP request event, even if matched by the frame
- Specific address 1 events are enabled through bit 18 of the Wake on LAN register
- The frame's destination address matches the value programmed in the specific address 1 registers
- Multicast hash events are enabled through bit 19 of the Wake on LAN register
- Multicast hash filtering is enabled through bit 6 of the network configuration register
- The frame destination address matches against the multicast hash filter
- The frame destination address is not a broadcast
VLAN Support
The VLAN tag is inserted at the 13th byte of the frame adding an extra four bytes to the frame. To support these extra four bytes, the 10GbE controller can accept frame lengths up to 1,536 bytes by setting bit [8] in the network configuration register. If the VID (VLAN identifier) is null (0x000), this indicates a priority-tagged frame.
| 16-bit Tag Protocol Identifier (TPID) | 16-bit Tag Control Information (TCI) |
|---|---|
| 0x8100 | First 3 bits priority, then CFI bit, last 12 bits VID |
10GbE can be configured to reject all frames except VLAN tagged frames by setting the discard non-VLAN frames bit in the network configuration register.
- Bit 21 set if receive frame is VLAN tagged (i.e. type id of 0x8100).
- Bit 20 set if receive frame is priority tagged (i.e. type id of 0x8100 and null VID). (If bit 20 is set bit 21 will be set also)
- Bit 19, 18 and 17 set to priority if bit 21 is set.
- Bit 16 set to CFI if bit 21 is set.
The 10GbE decoder treats a VLAN tag with the CFI bit set to one as invalid. Packet inspection does not continue past the source address if a VLAN tagged frame with the CFI bit set to one is received. CFI stands for canonical format indicator and was defined to be zero for Ethernet frames. A value of one was used for VLAN token ring frames. 802.1Q has since redefined this bit to mean drop eligible indicator (DEI). For backward comparability reasons 10GbE continues to treat VLAN tagged frames with the CFI/DEI bit set as invalid.
Checksum Offload for IP, TCP, and UDP
10GbE can be programmed to perform IP, TCP, and UDP checksum offloading in both the receive and transmit direction which is enabled by setting:
Bit 24 in the network configuration register for receive and
- IPv4 packets contain a 16-bit checksum field, which is the 16-bit 1’s complement of the 1’s complement sum of all 16-bit words in the header.
- TCP and UDP packets contain a 16-bit checksum field, which is the 16-bit 1’s complement of the 1’s complement sum of all 16-bit words in the header, the data and a conceptual IP pseudo header.
- To calculate these checksums in software requires each byte of the packet to be processed. For TCP and UDP, this can use a large amount of processing power. Offloading the checksum calculation to hardware can result in significant performance improvements.
For IP, TCP, or UDP checksum offload to be useful, the operating system containing the protocol stack must be aware that this offload is available so that it can make use of the fact that the hardware can either generate or verify the checksum.
Receiver Checksum Offload
- If present, the VLAN header must be four octets long and the CFI bit must not be set. (Also for receive onestacked VLAN is supported.)
- Encapsulation must be RFC 894 Ethernet Type Encoding or RFC 1042 SNAP Encoding or PPPoE Encoding
- IPv4 packet
- IP header is of a valid length
- IP options are supported
- IPv4 or IPv6 packet
- IP options and all IPv6 extension headers (i.e. hop-by-hop, routing and destination) are supported (except for fragmentation headers)
- Good IP header checksum (if IPv4)
- IP fragmentation is not supported (If a packet is fragmented, then the checksum will not be checked)
- Reserved bit must not be set in the IPv4 header flags field
- TCP or UDP packet
When an IP, TCP, or UDP frame is received, the receive buffer descriptor gives an indication if 10GbE was able to verify the checksums. There is also an indication if the frame had SNAP encapsulation. These indication bits will replace the type ID match indication bits when the receive checksum offload is enabled. If any of the checksums are verified incorrect by 10GbE, the packet is discarded and the appropriate statistics counter incremented.
Transmitter Checksum Offload
The transmitter checksum offload is only available if 10GbE is configured to use the DMA in packet buffer mode and full store and forward mode is enabled. This is because the complete frame to be transmitted must be read into the packet buffer memory before the checksum can be calculated and written back into the headers at the beginning of the frame.
- For transmit checksum generation and substitution to occur, the protocol of the frame must be recognized, and the frame must be provided without the FCS field, by making sure that bit [16] of the transmit descriptor word 1 is clear. If the frame data already had the FCS field, this would be corrupted by the substitution of the new checksum fields.
- Stacked VLANs are supported as long as the VLAN type field of the stacked VLAN is set to 0x88A8. This differs from receive where the stacked VLAN type field is a programmable value. 0x88A8 is the ethertype for the S-TAG defined in the IEEE 802.1ad QinQ standard.
- If these conditions are met, the transmit checksum offload engine will calculate the IP, TCP, and UDP checksums as appropriate. Once the full packet is completely written into packet buffer memory, the checksums will be valid and the relevant DPRAM locations will be updated for the new checksum fields as per standard IP/TCP and UDP packet structures.
If the transmitter checksum engine is prevented from generating the relevant checksums, bits [22:20] of the transmitter DMA writeback status will be updated to identify the reason for the error. Note: The frame will still be transmitted but without the checksum substitution, as typically the reason that the substitution did not occur was that the protocol was not recognized
Receive Header/Data Splitting
The DMA design has support for various optional offload features to reduce CPU overhead.
- Receive header data splitting is a feature which when enabled will force the receive DMA to split a received frame into its header and payload constituent parts.
- The header is not dropped, but instead separated from the payload and placed into its own DMA receive buffer. For example, a TCP/IPv4 frame will be split such that its Ethernet, IPv4 and TCP header is written into one or more buffers, and the TCP payload is written into one or more separate buffers.
- A status bit in the receive descriptors will identify whether the descriptor is pointing to a header buffer or a payload buffer
- Another status bit will identify whether the descriptor is pointing to the last buffer of a header (the header may be split over multiple buffers).
- The length of the header will also be written in the descriptor that points to the last buffer of the header.
When header/data splitting is enabled, ALL received frames will have their L2/L3/L4 headers separated. Note: Even standard Ethernet only frames containing just 14 bytes of header will be separated. Bit 5 of the DMA configuration register at offset 0x10 enables/disables header data splitting.
- VLAN
- SNAP
- PPoE
- IP
- TCP
If any of these encapsulations are recognized, then they will be included in the header buffer.
- Header Data Splitting applies to all received frames. Every frame received which includes a data payload will be split into at least 2 data buffers.
- Header Data Splitting is not currently available with the partial store and forward mode of operation. It should only be enabled when the full store and forward mode of operation is active.
Low Power Features
10GbE provides Energy-Efficient Ethernet (EEE) support. IEEE 802.3az adds support for energy efficiency to Ethernet. These are the key features of 802.3az:
- Allows a system’s transmit path to enter a low power mode if there is nothing to transmit
- Allows a PHY to detect whether its link partner’s transmit path is in low-power mode, therefore allowing the system’s receive path to enter low-power mode
- Link remains up during lower power mode and no frames are dropped
- Asymmetric, one direction can be in low-power mode while the other is transmitting normally
- LPI (Low Power Idle) signaling is used to control entry and exit to and from low power modes
- LPI signaling can only take place if both sides have indicated support for it through auto-negotiation
- Low-power control is done at the MII (reconciliation sublayer).
- As an architectural convenience in writing the IEEE 802.3az, it is assumed that transmission is deferred by
- asserting carrier sense; in practice it is not done this way. This system will know when it has nothing to transmit and only enter low-power mode when it is not transmitting.
- LPI should not be requested unless the link has been up for at least one second
- LPI is signaled on the GMII transmit path by asserting 0x01 on txd with tx_en low and tx_er high
- A PHY, on seeing LPI requested on the MII, sends the sleep signal before going quiet. After going quiet, it
- periodically emits refresh signals.
- The sleep, quiet, and refresh periods are defined in Table 78-2 of IEEE 802.3az. For 1000BASE-X, the sleep period is 20 microseconds, the quiet period 2.5 milliseconds, and the refresh period 20 microseconds.
- 1000BASE-X is required to go quiet after sleep is signaled. The easiest way to do this is to write to a control register to disable transmit in the SerDes.
- SGMII is not part of IEEE 802.3az and should not go quiet after sleep is signaled.
- LPI mode ends by transmitting normal idle for the wake time. There is a default time for this but it can be adjusted in software using the Link Layer Discovery Protocol (LLDP) described in Clause 79 of 802.3az.
- LPI is indicated at the receive side when sleep and refresh signaling has been detected.
LPI Operation in IP
It is best to use firmware to control LPI. LPI operation is something that happens at the system level. Firmware gives maximum control and flexibility of operation. LPI operation is straightforward, and firmware should be capable of responding within the required timeframes.
- If the link has been up for 1 second and there is nothing being transmitted, write to the LPI bit in network control
- register
- If connected to 1000BASE-T PHY using SGMII or RGMII, there is nothing more to do.
- If connected to a backplane using a 1000BASE-KX PHY, use firmware to periodically disable the SerDes transmit path. (Write to bit 1.160.0 for 1000BASE-KX.)
- Wake up by clearing the LPI bit in the network control register
The LPI bit is ORed into the transmit pause functionality so transmission will pause
- 64ns for gigabit operation
- 320ns for 100M operation
- 3200ns for 10M operation
In other words 8 tx_clk periods. So for example, if sys_wake_time is set to 8 in 100M mode, transmission will be
paused for 8 * 320 ns = 25.6 microseconds after deassertion of tx_lpi_en.
- Wait for an interrupt to indicate that LPI has been received
- Disable relevant parts of the receive path if desired but keep the PCS and SerDes active
- Wait for an interrupt to indicate that regular idle has been received and then re-enable the receive path.