The performance of networks is impacted by any packet loss. This is especially pronounced for reliable data transfer protocols that are built on top of unicast or multicast UDP sockets.
First check to see if packets have been dropped by the network adapter
before reaching the Onload stack. Use ethtool
to collect stats directly from the network
adapter:
# ethtool -S enps0f0 | grep -E 'drop|discard'
Counter | Description |
---|---|
rx_noskb_drops
|
Number of packets dropped when there are no further socket buffers to use. |
port_rx_nodesc_drops
|
Number of packets dropped when there are no further descriptors in the rx ring buffer to receive them. |
port_rx_dp_di_dropped_packets
|
Number of packets dropped because filters indicate the packets should be dropped - this can happen when packets do not match any filter or the matched filter indicates the packet should be dropped. |
port_rx_dp_q_disabled_packets
|
Number of packets sent to a queue which does not exist. A small number might be observed following initialization or teardown, a larger number or incrementing number might indicate a mismatch between the size of a VI set and the actual number of VIs. |
port_rx_pm_discard_bb_overflow
|
Number of packets discarded due to packet memory buffer overflow. |
port_rx_pm_discard_vfifo_full
|
Count of the number of packets dropped because of a lack of main packet memory on the adapter to receive the packet into. |
port_rx_pm_discard_mapping
|
Number of packets dropped because they have an 802.1p priority level configured to be dropped. |
# ethtool -S enps0f0 | grep drop
rx_noskb_drops: 0
port_rx_nodesc_drops: 0
port_rx_dp_di_dropped_packets: 681618610
Solution
The most common cause for this is the application being descheduled. You can detect
this using the scheduling statistics from cat /proc/<pid>/sched
for the application. The nr_involuntary_switches
counter records
the number of times the process was descheduled, for example because of an interrupt
handler or another task running on the same CPU core.You should ensure that the
application CPU cores are isolated to avoid descheduling. If it is not possible to
isolate the cores, consider switching to interrupt mode.
If packet loss is observed at the network level due to a lack of receive buffering try increasing the size of the receive descriptor queue size via EF_RXQ_SIZE. If packet drops are observed at the socket level consult the application documentation. It might also be worth experimenting with socket buffer sizes (see EF_UDP_RCVBUF). Setting the EF_EVS_PER_POLL variable to a higher value can also improve efficiency. Refer to Parameter Reference for descriptions of these variables.