The performance of networks is impacted by any packet loss. This is especially pronounced for reliable data transfer protocols that are built on top of unicast or multicast UDP sockets.
First check to see if packets have been dropped by the network adapter
before reaching the Onload stack. Use ethtool to collect stats directly from the network
adapter:
# ethtool -S enps0f0 | grep -E 'drop|discard'
| Counter | Description |
|---|---|
rx_noskb_drops
|
Number of packets dropped when there are no further socket buffers to use. |
port_rx_nodesc_drops
|
Number of packets dropped when there are no further descriptors in the rx ring buffer to receive them. |
port_rx_dp_di_dropped_packets
|
Number of packets dropped because filters indicate the packets should be dropped - this can happen when packets do not match any filter or the matched filter indicates the packet should be dropped. |
port_rx_dp_q_disabled_packets
|
Number of packets sent to a queue which does not exist. A small number might be observed following initialization or teardown, a larger number or incrementing number might indicate a mismatch between the size of a VI set and the actual number of VIs. |
port_rx_pm_discard_bb_overflow
|
Number of packets discarded due to packet memory buffer overflow. |
port_rx_pm_discard_vfifo_full
|
Count of the number of packets dropped because of a lack of main packet memory on the adapter to receive the packet into. |
port_rx_pm_discard_mapping
|
Number of packets dropped because they have an 802.1p priority level configured to be dropped. |
# ethtool -S enps0f0 | grep drop
rx_noskb_drops: 0
port_rx_nodesc_drops: 0
port_rx_dp_di_dropped_packets: 681618610
Solution
The most common cause for this is the application being descheduled.
You can detect this using the scheduling statistics from cat
/proc/<pid>/sched for the application. The nr_involuntary_switches counter records the number of
times the process was descheduled, for example because of an interrupt handler or
another task running on the same CPU core. You should ensure that the application
CPU cores are isolated to avoid descheduling. If it is not possible to isolate the
cores, consider switching to interrupt mode.
If packet loss is observed at the network level due to a lack of receive buffering try increasing the size of the receive descriptor queue size via EF_RXQ_SIZE. If packet drops are observed at the socket level consult the application documentation. It might also be worth experimenting with socket buffer sizes (see EF_UDP_RCVBUF). Setting the EF_EVS_PER_POLL variable to a higher value can also improve efficiency. Refer to Parameter Reference for descriptions of these variables.