Hi there,
suppose you have one of those fancy Intel CPU’s with a lot of cores, 128 in our case and you want to use them together with an Intel E810 on 100 Gb/s to distribute your traffic to a lot of Suricata workers and you end up with the number of interrupts limited to 64.
This also means it doesn’t make sense to have more than 64 RSS queues, even though you can increase them easilly using ethtool.
If you do increase the RSS queue (ethtool --set-channels), but are limited to the 64 interrupts, the results of IRQ affinity looks like this:
$ cat /proc/interrupts | fgrep pcap0 | awk '{print $NF}'
ice-pcap0-TxRx-0-1
ice-pcap0-TxRx-2-3
ice-pcap0-TxRx-4-5
[...lines deleted...]
ice-pcap0-TxRx-108-109
ice-pcap0-TxRx-110-111
ice-pcap0-TxRx-112
ice-pcap0-TxRx-113
ice-pcap0-TxRx-114
ice-pcap0-TxRx-115
ice-pcap0-TxRx-116
ice-pcap0-TxRx-117
ice-pcap0-TxRx-118
ice-pcap0-TxRx-119
Two RSS queues are tight to 1 interrupt, which is not what you want.
We asked Intel before (you can find the thread on their forum) and they responded that 64 was the maximum. However, we found out it is possible to increase the number of interrupts and I would like to share this with you!
When reading het ice-driver manpage that comes with the source code, you’ll see this: The driver will not automatically allocate more than 64 MSI-X vectors for each PF. If you read on, it also states You can use sysfs to override the automatic MSI-X vector allocation for a particular PF and indeed, you can.
First, you’ll need the PCI-address of your Intel E810:
$ ethtool -i pcap0 | grep bus-info
bus-info: 0000:9c:00.0
With that, you can obtain the current settings:
$ devlink resource show pci/0000:9c:00.0
pci/0000:9c:00.0:
name msix size 1024 occ 197 unit entry dpipe_tables none
resources:
name msix_misc size 68 unit entry dpipe_tables none
name msix_eth size 64 occ 64 unit entry size_min 1 size_max 256 size_gran 1 dpipe_tables none
name msix_vf size 827 occ 0 unit entry size_min 0 size_max 953 size_gran 1 dpipe_tables none
name msix_rdma size 65 occ 65 unit entry size_min 2 size_max 257 size_gran 1 dpipe_tables none
Please not the line where it says that the PF got 64 interrupts: msix_eth size 64.
In order to increase this number, you also have to decrease the number of interrupts elsewhere because of the 1024 upper limit (they don’t tell you in the man-page). So, suppose I want my PF to have 120 interrupts (128 cores in total, 8 of them we reserve for the kernel and other tooling), I’ll need an extra 56 interrupts. We “steal” those from the VF’s as they not used anyway:
# devlink resource set pci/0000:9c:00.0 path msix/msix_vf size 771
# devlink resource set pci/0000:9c:00.0 path msix/msix_eth size 120
Check it, note the size_new::
$ devlink resource show pci/0000:9c:00.0
pci/0000:9c:00.0:
name msix size 1024 occ 197 unit entry dpipe_tables none size_valid true
resources:
name msix_misc size 68 unit entry dpipe_tables none
name msix_eth size 64 size_new 120 occ 64 unit entry size_min 1 size_max 256 size_gran 1 dpipe_tables none
name msix_vf size 827 size_new 771 occ 0 unit entry size_min 0 size_max 953 size_gran 1 dpipe_tables none
name msix_rdma size 65 occ 65 unit entry size_min 2 size_max 257 size_gran 1 dpipe_tables none
Activate and validate:
# devlink dev reload pci/0000:9c:00.0 action driver_reinit
reload_actions_performed:
driver_reinit
$ devlink resource show pci/0000:9c:00.0
pci/0000:9c:00.0:
name msix size 1024 occ 197 unit entry dpipe_tables none
resources:
name msix_misc size 68 unit entry dpipe_tables none
name msix_eth size 120 occ 120 unit entry size_min 1 size_max 256 size_gran 1 dpipe_tables none
name msix_vf size 771 occ 0 unit entry size_min 0 size_max 953 size_gran 1 dpipe_tables none
name msix_rdma size 65 occ 65 unit entry size_min 2 size_max 257 size_gran 1 dpipe_tables none
The result of the re-init can also be seen using dmesg.
Now you can use the familiar set_irq_affinity and ethtool --set-channels to use 120 interrupts and 120 RSS queues and you will end up with something similar to this:
$ ethtool --show-channels pcap0
Channel parameters for pcap0:
Pre-set maximums:
RX: 256
TX: 256
Other: 1
Combined: 256
Current hardware settings:
RX: 0
TX: 0
Other: 1
Combined: 120
$ cat /proc/interrupts | fgrep pcap0 | awk '{print $NF}' | wc -l
120
Some details:
- we’re running the latest Debian version
- Intel ice-driver is 1.17.8
Our next step is to run all kinds of performance tests with this new hardware setup. I’ll keep you posted if we have something interesting to share.
Good luck and hope this helps.
Cheers,
John