High capture.kernel_drops

Help needed in identifying the reason for very high capture.kernel_drops constantly increasing.
TAP traffic connected to the NIC is not so high ~1gbps on average bursting to 2gbps.
Capture method is AF_PACKET
This system should be able to handle this easily but …
Tried to play with some AF_PACKET parameters but did not achieve significant reduction in drops.
Is cluster-type: cluster_flow optimal to use with this 10gbps NIC?
Any help or hints appreciated!

Please include the following information with your help request:

  • Suricata version
    Suricata version 8.0.0-dev
  • Operating system and/or Linux distribution
    Linux gr-moa-selksl-01 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
  • How you installed Suricata (from source, packages, something else)
    Suricata is running as a container and was installed as part of SELKS 10 SELKS by Stamus Networks

root@gr-moa-selksl-01:/home/selks-user# tail -f /opt/selksd/SELKS/docker/containers-data/suricata/logs/stats.log | grep capture
capture.kernel_packets | Total | 82424065
capture.kernel_drops | Total | 1461534

suricata.yaml

af-packet:

  • interface: ens192
    threads: 16 # or a number that is below half the number of cores available
    defrag: yes
    cluster-type: cluster_flow
    cluster-id: 99
    tpacket-v3: yes
    ring-size: 32768
    use-mmap: yes

Info about the system:

ESXi 7.0.3 VM with 2 Sockets X 24 CPUs and 128GB RAM

Hewlett-Packard Company Ethernet 10Gb 2-port 560SFP+ Adapter is mapped to the VM in Passthrough mode

/etc/network/interfaces

allow-hotplug ens192
iface ens192 inet manual
pre-up ethtool -L ens192 combined 16
pre-up ethtool -G ens192 rx 4096
pre-up ethtool -N ens192 rx-flow-hash tcp4 sdfn
pre-up ethtool -N ens192 rx-flow-hash udp4 sdfn
pre-up ethtool -N ens192 rx-flow-hash tcp6 sdfn
pre-up ethtool -N ens192 rx-flow-hash udp6 sdfn
up ip link set mtu 9000 dev ens192
up ip link set ens192 promisc on

root@gr-moa-selksl-01:/home/selks-user# ethtool -l ens192
Channel parameters for ens192:
Pre-set maximums:
RX: n/a
TX: n/a
Other: 1
Combined: 48
Current hardware settings:
RX: n/a
TX: n/a
Other: 1
Combined: 16

root@gr-moa-selksl-01:/home/selks-user# ethtool -g ens192
Ring parameters for ens192:
Pre-set maximums:
RX: 4096
RX Mini: n/a
RX Jumbo: n/a
TX: 4096
Current hardware settings:
RX: 512
RX Mini: n/a
RX Jumbo: n/a
TX: 512
RX Buf Len: n/a
CQE Size: n/a
TX Push: off
TCP data split: n/a

root@gr-moa-selksl-01:/home/selks-user# ethtool -x ens192
RX flow hash indirection table for ens192 with 16 RX ring(s):
0: 0 1 2 3 4 5 6 7
8: 8 9 10 11 12 13 14 15
16: 0 1 2 3 4 5 6 7
24: 8 9 10 11 12 13 14 15
32: 0 1 2 3 4 5 6 7
40: 8 9 10 11 12 13 14 15
48: 0 1 2 3 4 5 6 7
56: 8 9 10 11 12 13 14 15
64: 0 1 2 3 4 5 6 7
72: 8 9 10 11 12 13 14 15
80: 0 1 2 3 4 5 6 7
88: 8 9 10 11 12 13 14 15
96: 0 1 2 3 4 5 6 7
104: 8 9 10 11 12 13 14 15
112: 0 1 2 3 4 5 6 7
120: 8 9 10 11 12 13 14 15
RSS hash key:
35:69:5f:bb:7f:70:e1:e4:c9:a8:d0:10:d7:a7:35:fc:de:5c:96:54:75:df:a6:61:65:86:3c:37:75:7a:83:25:2e:44:c2:55:9c:f5:fd:45
RSS hash function:
toeplitz: on
xor: off
crc32: off

lspci

0b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
DeviceName: pciPassthru0
Subsystem: Hewlett-Packard Company Ethernet 10Gb 2-port 560SFP+ Adapter
Physical Slot: 192
Flags: bus master, fast devsel, latency 64, IRQ 19
Memory at fd200000 (32-bit, non-prefetchable) [size=1M]
I/O ports at 5000 [size=32]
Memory at fd3fc000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [e0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 14-02-ec-ff-ff-81-b3-b8
Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
Kernel driver in use: ixgbe
Kernel modules: ixgbe

I wouldn’t see that as very high drops, it’s below 2% but still not perfect.

Can you look at the stats over a period of time and check the delta? Elephant Flows could cause this.

Also run htop and see if the cores are properly balanced regarding the load or if there is any core at 100% most of the time.

Also probide the suricata.yaml and the full stats.log alongside the suricata.log to check for other indicators.

Another idea is to do a run without signatures to see if it triggers as well.