Another kernel_drops post :/

First time running Suricata - started from an old post about Elastic integration and ended up building v5 instead of v6.

Running in IDS mode, with a SPAN from our internet firewall’s LAN interface into an ESXi VM, bandwidth is max 500Mbit up + down, but would rarely see more than 100Mbit for both combined over a 5 minute average.

Initially I was seeing Suricata run for 2-3 days then start dropping all packets (would still be updating stats.log but only capture.kernel_drops was increasing)

Based on some other posts here I set the following in main config:

mmap-locked: yes
tpacket-v3: yes
runmode: workers

Also stopped some other services running on the same VM, and disabled all but a single Suricata rule (something regularly triggers the " ET DNS Non-DNS or Non-Compliant DNS traffic on DNS port Reserved Bit Set" event so it’s handy to see that packets are still being processed and fast.log getting updated)

Now it doesn’t hit a point where dropping 100%, but still seeing significant drops, if I monitor it after restarting service it’s somewhere between 0 and 1% for a while, but if I come back after a few days, drop count is around 85% of capture

Physical NIC is Intel i350, from ESXi console it appears there’s only 1 RX queue, vNIC is VMXNET3

CPUs are Xeon E5-2609, getting pretty old I guess but have bumped the VM up from 4 to 6 cores, overall CPU utilisation on the host is around 15%, suricata usage in in the VM sits around 5%

Any suggestions on how I might identify the cause of these drops?

I have absolutely no experience with esxi, but I would look into NIC passthrough for the Suricata VM.

Graphing CPU usage, RAM usage, throughput (packets and bytes) and drops over time might give some insight into what is causing the drops.

Turns out there was a bit of stupidity on my part here, when I searched for mmap-locked & tpacket-v3 in the config file I didn’t see the config lines further up which set these for a specific interface which didn’t match my interface name. Having corrected that and also increased buffer-size, I’m down to 3% drop, which might not be ideal, but is a massive improvement.