High capture.kernel_drops on OpenSuse 15.4

AlirezaPourchali · November 18, 2024, 3:36pm

Im trying to configure suricata for our infrastructure
we have multiple machines acting as our edge and the every machine have 8 interfaces that handle routing the packet.
suricata is deployed on these in IDS and planning for IPS in future

right now im experiencing high capture.kernel_drops (about 5%)
on IDS mode and i need help to find the reason for this.

im currently using 3 core cpu and about 3gb of ram
i can change the cpu and memory for this but i dont think these are the problem

i have tried tweaking the configuration
added hyperscan
set detect.profile=high
and my suricata systemd

[Unit]
Description=Suricata Intrusion Detection Service
After=syslog.target network-online.target

[Service]
# Environment file to pick up $OPTIONS. On Fedora/EL this would be
# /etc/sysconfig/suricata, or on Debian/Ubuntu, /etc/default/suricata.
EnvironmentFile=-/etc/sysconfig/suricata
#EnvironmentFile=-/etc/default/suricata
ExecStartPre=/bin/rm -f /var/run/suricata.pid
ExecStart=/sbin/suricata -c /etc/suricata/suricata.yaml --pidfile /var/run/suricata.pid $OPTIONS
ExecReload=/bin/kill -USR2 $MAINPID

[Install]
WantedBy=multi-user.target

OPTIONS="--af-packet  --group suricata --user suricata --set mpm-algo=hs --set spm-algo=hs "
LD_PRELOAD="/usr/lib64/libtcmalloc.so.4.3.0"

suricata configure before build

configure --enable-gccprotect --enable-pie --disable-gccmarch-native \
        --disable-coccinelle --enable-nfqueue --enable-af-packet \
        --with-libnspr-includes=/usr/include/nspr4 \
        --with-libnss-includes=/usr/include/nss3 \
        --enable-jansson --enable-geoip --enable-lua --enable-hiredis \
        --enable-rust  \
        --enable-ebpf-build --enable-ebpf \
        --enable-python --with-clang=/usr/bin/clang

suricata-additional.yaml (3.3 KB)
suricata.yaml (84.0 KB)
stats.log (9.4 KB)

suricata-additional.yaml is the main af-packet that is included in suricata.yaml

Suricata version is 7.0.7

opensuse-leap 15.4

Suricata is installed by compiling it

Andreas_Herz · November 18, 2024, 4:34pm

What amount of traffic rate is seen there?

Can you also add the suricata.log/suricata.json logfile?

Any specific reason you did choose cluster_ebpf?

What type of NIC do you use?

What ruleset do you use?

Monitoring 8 interfaces at one time with just 3 cores and 3GB of memory sounds fart too low.

AlirezaPourchali · November 18, 2024, 7:25pm

this is the packet per second in grafana for the last 24 hours

and
suricata.log (5.0 KB)

we used cluster_ebpf for optimization and filtering before the kernel
i thought it would be a good choice for our environment for the performance
(any recommendation would be great! im not an expert on this and would like to read about even beyond suricata)

as for the NIC

for rules i ran suricata-update and my sources are the default one
suricata-sources.yaml (3.9 KB)

this is like a secondary instance and im testing suricata to be production ready
i dont think there is much traffic on it, but im seeing dropped packets and its not fully using the memory and cpu, if you have suggestions on this it would be great
this is the usage

last 24 hours:

Thanks for the time

Andreas_Herz · November 18, 2024, 7:32pm

Do you also have the stats for the throughput in MBit/s?

Do you actually use any eBPF feature or just the bpf-filter?
You could test cluster_flow instead.

You can also use threaded stats log so you would see if the drops happen on a specific interface or run suricatasc -c 'iface-stat ethX' to see the stats for each interface. The drops could be spikes or temporary lack of resources like CPU power.

AlirezaPourchali · November 18, 2024, 7:55pm

for the throughput i calculated this

yeah for now i use the bpf-filter only
i think i tested the cluster_flow before and that didnt help the dropped packets

you are right, i think the drops are mostly on eth3

how can i know why thats happening?
the throughput on eth2 is as the same but not that much packets drop.
anyway to tune this? if not, does scaling the cpu and the memory help?

Andreas_Herz · November 18, 2024, 7:59pm

So except the spike the rate is around 200Mbit/s?
You could plot the stats over time as well to see if you can find a pattern, like drop increase when the traffic rate increased (spikes for example).

You could test increasing the memcaps as well,