Reduce CPU and % drops

Good afternoon. We install Suricata and see problems when setting up with high CPU load (90%) or high drops (~35-40%).

I don’t know which category to put this topic in.
The characteristics of the serer are as follows:

  • CPU
    CPU(s): 32
    NUMA:
    NUMA node(s): 2
    NUMA node0 CPU(s): 0-7,16-23
    NUMA node1 CPU(s): 8-15,24-31

  • network - Mellanox 40G

  • RAM - 314GB

  • traffic
    minimum - 4,5 GiB/s
    maximum - 12,5 GiB/s
    average - 9 GiB/s

  • Kernel version 5.14.0-362.13.1.el9_3.x86_64

  • Suricata: This is Suricata version 7.0.2 RELEASE

I add our configuration file suricata.yaml:
suricata.yaml (69.2 KB)

Need help setting up to reduce drops and CPU.

Please provide the stats.log as well and also the suricata.log

Date: 1/15/2024 – 11:37:24 (uptime: 0d, 00h 03m 57s)

Counter | TM Name | Value

capture.kernel_packets | Total | 321944864
capture.kernel_drops | Total | 168722943
capture.afpacket.polls | Total | 91
capture.afpacket.poll_data | Total | 75
decoder.pkts | Total | 121611987
decoder.bytes | Total | 71431665723
decoder.ipv4 | Total | 121600189
decoder.ipv6 | Total | 3721
decoder.ethernet | Total | 121611987
decoder.arp | Total | 8066
decoder.unknown_ethertype | Total | 11
decoder.tcp | Total | 53100961
tcp.syn | Total | 5414695
tcp.synack | Total | 325254
tcp.rst | Total | 798145
decoder.udp | Total | 68450892
decoder.esp | Total | 24639
decoder.icmpv4 | Total | 24546
decoder.icmpv6 | Total | 2872
decoder.vlan | Total | 121611976
decoder.avg_pkt_size | Total | 587
decoder.max_pkt_size | Total | 2144
flow.total | Total | 7103633
flow.tcp | Total | 6947049
flow.udp | Total | 147784
flow.icmpv4 | Total | 6930
flow.icmpv6 | Total | 1234
flow.tcp_reuse | Total | 160
flow.wrk.spare_sync_avg | Total | 97
flow.wrk.spare_sync | Total | 68025
flow.wrk.spare_sync_incomplete | Total | 17132
flow.wrk.flows_evicted_needs_work | Total | 5403938
flow.wrk.flows_evicted_pkt_inject | Total | 5496482
flow.wrk.flows_evicted | Total | 4123574
flow.wrk.flows_injected | Total | 5403842
flow.wrk.flows_injected_max | Total | 501
tcp.sessions | Total | 5406858
tcp.ssn_from_cache | Total | 287599
tcp.ssn_from_pool | Total | 5119259
tcp.pseudo | Total | 3908
tcp.invalid_checksum | Total | 38
tcp.ack_unseen_data | Total | 249579
tcp.segment_from_cache | Total | 2842834
tcp.segment_from_pool | Total | 376967
tcp.stream_depth_reached | Total | 317
tcp.reassembly_gap | Total | 255482
tcp.overlap | Total | 14132
detect.alert | Total | 1283
detect.alerts_suppressed | Total | 18093
app_layer.flow.http | Total | 2830
app_layer.tx.http | Total | 3012
app_layer.error.http.parser | Total | 8
app_layer.flow.ftp | Total | 163
app_layer.tx.ftp | Total | 5443
app_layer.error.ftp.gap | Total | 33
app_layer.flow.tls | Total | 80547
app_layer.error.tls.gap | Total | 18211
app_layer.error.tls.parser | Total | 299
app_layer.flow.ssh | Total | 107
app_layer.error.ssh.gap | Total | 46
app_layer.flow.dns_tcp | Total | 3
app_layer.tx.dns_tcp | Total | 6
app_layer.flow.ftp-data | Total | 185
app_layer.flow.ike | Total | 71
app_layer.tx.ike | Total | 104
app_layer.flow.quic | Total | 381
app_layer.tx.quic | Total | 253
app_layer.error.quic.parser | Total | 5
app_layer.flow.sip | Total | 62
app_layer.tx.sip | Total | 196
app_layer.error.sip.parser | Total | 136
app_layer.flow.rdp | Total | 4
app_layer.tx.rdp | Total | 4
app_layer.flow.http2 | Total | 1
app_layer.tx.http2 | Total | 18
app_layer.error.http2.gap | Total | 1
app_layer.flow.failed_tcp | Total | 1909
app_layer.flow.dns_udp | Total | 9139
app_layer.tx.dns_udp | Total | 13615
app_layer.flow.failed_udp | Total | 138131
flow.end.state.new | Total | 6684164
flow.end.state.established | Total | 70912
flow.end.state.closed | Total | 348557
flow.end.tcp_state.syn_sent | Total | 4985816
flow.end.tcp_state.syn_recv | Total | 11532
flow.end.tcp_state.established | Total | 52014
flow.end.tcp_state.fin_wait1 | Total | 3747
flow.end.tcp_state.fin_wait2 | Total | 2294
flow.end.tcp_state.time_wait | Total | 256
flow.end.tcp_state.last_ack | Total | 4642
flow.end.tcp_state.close_wait | Total | 2898
flow.end.tcp_state.closed | Total | 343659
flow.end.tcp_liberal | Total | 30136
flow.mgr.full_hash_pass | Total | 24
flow.mgr.rows_per_sec | Total | 22282
flow.spare | Total | 144578
flow.mgr.rows_maxlen | Total | 50
flow.mgr.flows_checked | Total | 43806692
flow.mgr.flows_notimeout | Total | 38428263
flow.mgr.flows_timeout | Total | 5378429
flow.mgr.flows_evicted | Total | 5378429
flow.mgr.flows_evicted_needs_work | Total | 4163007
memcap_pressure | Total | 17
memcap_pressure_max | Total | 30
flow.recycler.recycled | Total | 1699631
flow.recycler.queue_avg | Total | 42
flow.recycler.queue_max | Total | 23494
tcp.memuse | Total | 10151024
tcp.reassembly_memuse | Total | 1918560
http.memuse | Total | 8304
ftp.memuse | Total | 3
flow.memuse | Total | 573744464

suricata.log (9.2 KB)

The suricata.log is missing the start protion.

You could also post an example of htop run to see if it it’s just one or a few cores at peak.
This can also be seen if you enable threading on the stats The idea is to see if some elephant flows kill the performance.

Another idea would be to run perf top -p $(pidof suricata) so we could see if there is another bottleneck.

Also might be worth to assign more cores to the capture/worker threads. What CPU is that exactly?

The network card is installed in Numa1 and, probably, we incorrectly determined the processor binding, since we have no experience in setting it up

$ cat /sys/class/net/ens6f0np0/device/numa_node
1

CPU - Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz

What exactly do you want to see when starting, what messages?

Samples: 2M of event 'cycles', 4000 Hz, Event count (approx.): 637044021882 lost: 0/0 drop: 0/0
Overhead  Shared Object         Symbol
  28.12%  libpcre2-8.so.0.11.0  [.] pcre2_match_8
   3.90%  suricata              [.] FlowGetFlowFromHash
   3.10%  libhs.so.5.4.1        [.] 0x00000000005f40c1
   1.74%  libc.so.6             [.] __memmove_avx_unaligned_erms
   1.32%  [kernel]              [k] tpacket_rcv
   1.09%  [kernel]              [k] __skb_flow_dissect
   1.08%  libc.so.6             [.] __memchr_avx2
   0.99%  [kernel]              [k] __netif_receive_skb_core.constprop.0
   0.96%  [kernel]              [k] __siphash_aligned
   0.92%  [kernel]              [k] memcpy_erms
   0.85%  [kernel]              [k] mlx5e_skb_from_cqe_mpwrq_nonlinear
   0.73%  [kernel]              [k] __packet_rcv_has_room
   0.70%  suricata              [.] 0x000000000031a47b
   0.64%  suricata              [.] DetectEngineContentInspection
   0.63%  libhs.so.5.4.1        [.] 0x00000000005f4110
   0.61%  libhs.so.5.4.1        [.] avx2_hs_scan
   0.57%  suricata              [.] 0x000000000031a478
   0.57%  [kernel]              [k] kmem_cache_free
   0.55%  libc.so.6             [.] pthread_mutex_lock@@GLIBC_2.2.5
   0.53%  suricata              [.] 0x000000000031ab18
   0.53%  suricata              [.] IPOnlyMatchPacket
   0.48%  libhs.so.5.4.1        [.] 0x00000000005f45a8
   0.48%  [kernel]              [k] mlx5e_build_rx_skb
   0.46%  [kernel]              [k] packet_rcv_fanout
   0.46%  [kernel]              [k] mlx5e_handle_rx_cqe_mpwrq
   0.45%  [kernel]              [k] mlx5e_fill_skb_data
   0.44%  libpcre2-8.so.0.11.0  [.] memcpy@plt
   0.41%  [kernel]              [k] packet_rcv
   0.39%  libhs.so.5.4.1        [.] 0x00000000005f4148
   0.38%  suricata              [.] SigMatchSignaturesGetSgh
   0.38%  libhs.so.5.4.1        [.] 0x00000000005f40e6
   0.37%  [kernel]              [k] skb_copy_bits
   0.36%  suricata              [.] 0x00000000001e48e2
   0.35%  suricata              [.] 0x000000000031a48e
   0.32%  suricata              [.] 0x000000000031ab06
   0.31%  suricata              [.] FlowHandlePacketUpdate
   0.31%  [kernel]              [k] ip_check_defrag.part.0
   0.29%  suricata              [.] DetectPcrePayloadMatch
   0.29%  libhs.so.5.4.1        [.] 0x000000000056f47c
   0.28%  suricata              [.] 0x000000000031a47d
   0.28%  [kernel]              [k] mlx5e_rx_cq_process_basic_cqe_comp
   0.27%  suricata              [.] SCHSSearch
   0.27%  [kernel]              [k] __napi_build_skb
   0.27%  [kernel]              [k] __build_skb_around
   0.25%  libhs.so.5.4.1        [.] 0x000000000056f470
   0.25%  suricata              [.] Prefilter

perf top -p $(pidof suricata)

So the htop shows that all cores are maxed out.

I would like to see the full suricata.log to spot any other hints, also the amount of signatures loaded. Especially given the fact that pcre2_match_8 has a high overhead, which rulesets do you run and do you have custom signatures?

For a performance baseline you could also run without any signatures in a test to see if there is already a performance issue even without any signatures being applied to the traffic.

With all the rules turned off and traffic of approximately 6.5G, there are still drops, approximately 5%. Moreover, according to htop, all cores are in the “shelf”, threads are set to 16.

The total number of rules is 78032.
We tried to increase the number of threads from 16 to 28 to 46, then the CPU and drops were even greater.

We tried to configure according to these instructions 11.5. High Performance Configuration — Suricata 8.0.0-dev documentation

It will display a message that the key is too short. Is this relevant?

/usr/local/sbin/ethtool -X eth1 hkey 6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A equal 16

Error: Key is too long for device (41 > 40)

The Overhead with the FlowGetFlomFromHash is quite big. Can you post a full suricata.log and stats.log for the test run without rules?

With 6.5G we would have around 500Mbit/s per core, which could still be too much but if more threads didn’t help it must be something specific to the traffic.

Is this regular LAN and WAN traffic?
Is the traffic encapsulated?
Do you have bidirectional traffic?

The performance guide is for Intel cards, that won’t apply to Mellanox cards.

suricata.log (3.1 KB)
stats.log (74.6 KB)

In terms of traffic, this is normal traffic from the network edge, there are GRE and IPSec tunnels, traffic to the server is mirrored via TAP in both directions

I would update the Suricata config, as you can see in the log you’re still using a lot of old values. Ideally start with a fresh Suricata 7 config we ship and adopt your settings.
Also see the warning about the block-size.

In addition to that please run:

sudo tshark -i ens6f0np0 -q -z conv,ip -a duration:5 | awk 'BEGIN{fail=0;ok=0;}{if($5==0){ print $3" -> "$1": "$7; fail+=$7;} else if($7==0) {print $1" -> "$3": "$5;fail+=$5} else {ok+=$5;ok+=$7;}}END{print "Unidirectional Traffic (Bytes): "fail"\nBidirectional Traffic (Bytes): "ok"\nUnidirectrional Traffic (Percentage): "int(fail/(fail+ok)*100)}'

I want to see if the traffic is really bidirectional.

We installed a new config and launched Suricata. Now without rules with 10G traffic the CPU is loaded at 30%. What changes are needed in the kernel and on the network card to process such traffic?

With 10G traffic the load of 30% doesn’t sound too bad, what is the state with the rules being enabled?

You could play around with the mellanox tools and settings and also try other cluster_ modes like cluster_qm but based on the perf tools output I would still recommend checking for the traffic if the majority is bidirectional and such.

We tried to install other modes and there were even more drops. On the network card we tried to change: /usr/local/sbin/ethtool -L eth1 combined 16, as well as /usr/local/sbin/ethtool -G eth1 rx 1024 to no effect.

If we add rules, then the CPU is 99%, and the drops are 40%