The following configuration options are applicable to RHEL7 systems.
First, set some configuration options that decrease latency for Onload acceleration technologies. On both machines:
- Add the following options to the kernel configuration line in /boot/grub/grub.conf:
isolcpus=<comma separated cpu list> nohz=off iommu=off intel_iommu=off mce=ignore_ce nmi_watchdog=0
- Stop the following services on the server:
systemctl stop cpupower systemctl stop cpuspeed systemctl stop cpufreqd systemctl stop powerd systemctl stop irqbalance systemctl stop firewalld
- Allocate huge pages. For example, to configure 1024 huge pages:
# sysctl -w vm.nr_hugepages=1024
To make this change persistent, update /etc/sysctl.conf. For example:
# echo "vm.nr_hugepages = 1024" >> /etc/sysctl.conf
For more information refer to Allocating Huge Pages.
- Consider the selection of the NUMA node, as this affects latency on a NUMA-aware system. Refer to Onload Deployment on NUMA Systems.
- Disable interrupt moderation.
# ethtool -C <interface> rx-usecs 0 adaptive-rx off
- Enable PIO in the Onload environment.
EF_PIO=1
Now perform the following configuration to improve latency without Onload.
Note: These configuration changes have minimal effect on the performance of Onload.
- Set interrupt affinity such that interrupts and the application are running on different CPU cores but on the same processor package.
- Use the following command to identify the interrupts used by the receive queues
created for an interface:
# cat /proc/interrupts | grep <interface>
The output lists the IRQs. For example:
34: ... PCI-MSI-edge p2p1-0 35: ... PCI-MSI-edge p2p1-1 36: ... PCI-MSI-edge p2p1-2 37: ... PCI-MSI-edge p2p1-3 38: ... PCI-MSI-edge p2p1-ptp
- Direct the listed IRQs to unused CPU cores that are on the same processor package as the application. For example, to direct IRQs 34-38 to CPU core 2 (where cores are numbered from 0 upwards), using
bash
:# for irq in {34..38} > do > echo 04 > /proc/irq/$irq/smp_affinity > done
- Use the following command to identify the interrupts used by the receive queues
created for an interface:
- Set an appropriate tuned profile:
- The tuned network-latency profile produces better kernel latency results:
# tuned-adm profile network-latency
- If available, the cpu-partitioning profile includes the network-latency profile, but also makes it easy to isolate cores that can be dedicated to interrupt handling or to an application. For example, to isolate cores 1-3:
# echo "isolated_cores=1-3" \ > /etc/tuned/cpu-partitioning-variables.conf # tuned-adm profile cpu-partitioning
- The tuned network-latency profile produces better kernel latency results:
- Enable the kernel “busy poll” feature to disable interrupts and allow polling of the socket receive queue. The following values are recommended:
# sysctl net.core.busy_poll=50 && sysctl net.core.busy_read=50