Detect packet decrease after suricata restart

10G suricata IDS.

  1. Checked the loss of detection packet.
  2. i tried change memcap 1gb -> 16gb
  3. restart suricata
  4. detect log decrease to almost 1/1000.
  5. “capture.kernel_drop” is more than “capture.kernel” in stats.log file.
    ex) capture.kernel_packets : 5910262
    ex) capture.kernel_drops : 689554867
  6. “ifconfig ens1f0” many drop checked.
  7. “ethtool ens1f0” Supported link modes: Not reported.
  8. “ethtool ens1f1” Supported link modes: 10000baseT/Full.

Does anyone know of a phenomenon like this?
I tried “ifconfig ens1f0 down/up” but it didn’t work.

Can you please share -
What Suricata version are you running, what NIC/OS/Kernel version?
How do you restart Suricata and two stats.log updates before and after the restart please?

Current Status.

  1. Restart the Suricata service
  2. CPU usage and detection log decrease (by service restart)
  3. Increased CPU usage little by little (about several hours/day ~ unlimited) (Almost 10% hold / Normal CPU usage is 60%)
  4. Vertical increase in CPU usage at some point (10% -> 60%) (service normalization)

Problem

  1. Unlimited wait for “4.” occurrences (service normalization) in the above situation.
  2. Loss of detection log occurs even if the 4th situation becomes “service normalization”

Version

  • Suricata : 6.0
  • NIC :driver:i40e, version: 2.4.6
  • OS : CentOS 8.1.1911
  • Kernel : 4.18.0-147.5.1.el8_1.x86_64

suricata.yaml (modify)

  • stream.memcap : 1024mb
  • stream.reassembly.memcap : 4096mb
  • stream.reassembly.depth : 100mb
  • stream.reassembly.raw : yes (for “http_uri” option)

ethtool -i ens1f0

ens1f0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500
ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 23894985400207 bytes 277088415324988 (252.0 TiB)
RX errors 0 dropped 219017771 overruns 0 frame 10
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

stats1.log -> restart -> stats2.log

Suricata restart

kill 11111 [sid num]

[path]/suricata -c [path]/suricata.yaml --pfring-int=ens1f0 --pfring-cluster-id=99 --pfring-cluster-type=cluster_flow --pidfile=[path]/suricata.pid -D

stats1.log

Date: 1/5/2021 – 17:41:28 (uptime: 1d, 00h 55m 39s)

Counter | TM Name | Value

capture.kernel_packets | Total | 152855051850
capture.kernel_drops | Total | 37496570617
decoder.pkts | Total | 152856914452
decoder.bytes | Total | 32115488864006
decoder.invalid | Total | 532752
decoder.ipv4 | Total | 152858978002
decoder.ipv6 | Total | 94877
decoder.ethernet | Total | 152856914452
decoder.tcp | Total | 98314238060
decoder.udp | Total | 54471638542
decoder.sctp | Total | 70
decoder.icmpv4 | Total | 34891041
decoder.icmpv6 | Total | 858
decoder.gre | Total | 116
decoder.vlan | Total | 1162070597
decoder.vxlan | Total | 1539
decoder.teredo | Total | 25162
decoder.avg_pkt_size | Total | 210
decoder.max_pkt_size | Total | 1518
flow.tcp | Total | 531794194
flow.udp | Total | 57792694
flow.icmpv4 | Total | 636032
flow.icmpv6 | Total | 341
defrag.ipv4.fragments | Total | 19185524
defrag.ipv4.reassembled | Total | 3036927
defrag.ipv6.fragments | Total | 84
decoder.event.icmpv4.unknown_code | Total | 192
decoder.event.icmpv6.unknown_type | Total | 1
decoder.event.icmpv6.unknown_code | Total | 12
decoder.event.icmpv6.unassigned_type | Total | 55
decoder.event.ipv6.exthdr_ah_res_not_null | Total | 29
decoder.event.ipv6.fh_non_zero_reserved_field | Total | 83
decoder.event.ipv6.data_after_none_header | Total | 169
decoder.event.ipv6.unknown_next_header | Total | 23830
decoder.event.ipv6.icmpv4 | Total | 606
decoder.event.tcp.opt_invalid_len | Total | 643
decoder.event.udp.pkt_too_small | Total | 253
decoder.event.udp.hlen_invalid | Total | 553
decoder.event.vlan.unknown_type | Total | 532054
decoder.event.ipv4.frag_overlap | Total | 22408
decoder.event.ipv4.frag_ignored | Total | 5172685
decoder.event.ipv6.frag_ignored | Total | 1
decoder.event.ipv6.ipv4_in_ipv6_wrong_version | Total | 240
decoder.event.ipv6.ipv6_in_ipv6_wrong_version | Total | 43
tcp.sessions | Total | 262206910
tcp.pseudo | Total | 103348
tcp.invalid_checksum | Total | 40
tcp.syn | Total | 279741467
tcp.synack | Total | 216717803
tcp.rst | Total | 125924330
tcp.pkt_on_wrong_thread | Total | 4
tcp.segment_memcap_drop | Total | 10712304888
tcp.stream_depth_reached | Total | 1960
tcp.reassembly_gap | Total | 7580094131
tcp.overlap | Total | 268468
tcp.insert_data_normal_fail | Total | 37137443735
tcp.insert_data_overlap_fail | Total | 1211
detect.alert | Total | 297736
app_layer.flow.http | Total | 3223182
app_layer.tx.http | Total | 5061704
app_layer.flow.ftp | Total | 1728
app_layer.tx.ftp | Total | 14078
app_layer.flow.smtp | Total | 2939
app_layer.tx.smtp | Total | 3245
app_layer.flow.tls | Total | 3891728
app_layer.flow.ssh | Total | 1446
app_layer.flow.smb | Total | 810
app_layer.tx.smb | Total | 15540
app_layer.flow.dcerpc_tcp | Total | 277
app_layer.flow.dns_tcp | Total | 298
app_layer.tx.dns_tcp | Total | 632
app_layer.flow.ntp | Total | 37477
app_layer.tx.ntp | Total | 37692
app_layer.flow.ftp-data | Total | 13
app_layer.flow.tftp | Total | 10141
app_layer.tx.tftp | Total | 9749
app_layer.flow.ikev2 | Total | 518
app_layer.tx.ikev2 | Total | 171
app_layer.flow.krb5_tcp | Total | 120
app_layer.tx.krb5_tcp | Total | 120
app_layer.flow.dhcp | Total | 5
app_layer.tx.dhcp | Total | 41
app_layer.flow.snmp | Total | 3312856
app_layer.tx.snmp | Total | 7729415
app_layer.flow.failed_tcp | Total | 1639400
app_layer.flow.dcerpc_udp | Total | 232
app_layer.flow.dns_udp | Total | 46098785
app_layer.tx.dns_udp | Total | 92827831
app_layer.flow.krb5_udp | Total | 405
app_layer.flow.failed_udp | Total | 8332275
flow_mgr.closed_pruned | Total | 155721008
flow_mgr.new_pruned | Total | 294487981
flow_mgr.est_pruned | Total | 31927545
flow.emerg_mode_entered | Total | 25609
flow.emerg_mode_over | Total | 25608
flow.tcp_reuse | Total | 35243
flow_mgr.flows_checked | Total | 9555
flow_mgr.flows_notimeout | Total | 9529
flow_mgr.flows_timeout | Total | 26
flow_mgr.flows_timeout_inuse | Total | 1
flow_mgr.flows_removed | Total | 25
flow_mgr.rows_checked | Total | 65536
flow_mgr.rows_skipped | Total | 64162
flow_mgr.rows_maxlen | Total | 17
tcp.memuse | Total | 32112640
tcp.reassembly_memuse | Total | 5505024
ftp.memuse | Total | 1424648
app_layer.expectations | Total | 182
flow.memuse | Total | 134217440

stats2.log

Date: 1/5/2021 – 17:42:06 (uptime: 0d, 00h 00m 10s)

Counter | TM Name | Value

capture.kernel_packets | Total | 3541000
capture.kernel_drops | Total | 8457568
decoder.pkts | Total | 4061057
decoder.bytes | Total | 842845532
decoder.invalid | Total | 17
decoder.ipv4 | Total | 4061026
decoder.ipv6 | Total | 1
decoder.ethernet | Total | 4061057
decoder.tcp | Total | 2630823
decoder.udp | Total | 1428958
decoder.icmpv4 | Total | 700
decoder.vlan | Total | 19240
decoder.teredo | Total | 1
decoder.avg_pkt_size | Total | 207
decoder.max_pkt_size | Total | 1518
flow.tcp | Total | 115323
flow.udp | Total | 20939
flow.icmpv4 | Total | 76
decoder.event.ipv6.unknown_next_header | Total | 1
decoder.event.vlan.unknown_type | Total | 17
tcp.sessions | Total | 7012
tcp.pseudo | Total | 6
tcp.syn | Total | 7036
tcp.synack | Total | 6340
tcp.rst | Total | 3676
tcp.reassembly_gap | Total | 383
tcp.overlap | Total | 86
detect.alert | Total | 5
app_layer.flow.http | Total | 1583
app_layer.tx.http | Total | 2474
app_layer.flow.smtp | Total | 4
app_layer.tx.smtp | Total | 4
app_layer.flow.tls | Total | 1906
app_layer.flow.ssh | Total | 2
app_layer.flow.smb | Total | 1
app_layer.tx.smb | Total | 4
app_layer.flow.snmp | Total | 22
app_layer.tx.snmp | Total | 36
app_layer.flow.failed_tcp | Total | 648
app_layer.flow.dcerpc_udp | Total | 2
app_layer.flow.dns_udp | Total | 1093
app_layer.tx.dns_udp | Total | 2164
app_layer.flow.failed_udp | Total | 19822
flow.spare | Total | 9930
flow_mgr.flows_checked | Total | 29239
flow_mgr.flows_notimeout | Total | 29239
flow_mgr.rows_checked | Total | 65536
flow_mgr.rows_skipped | Total | 55735
flow_mgr.rows_maxlen | Total | 11
tcp.memuse | Total | 32113744
tcp.reassembly_memuse | Total | 68021024
http.memuse | Total | 53169097
flow.memuse | Total | 56395832

I see a lot of memcap hits after 1d according to the stats.
I see you are running pfring , can you make sure the pfring buffers are maxed out during the kernel module load (think 65535 is the max). Plus set the max-pending-packets in suricata.yaml to a higher value 32k or something
Also would suggest for a test to try lower the depth - stream.reassembly.depth : 1mb

  • stream.reassembly.depth : 1mb = Not Worked.
  • max-pending-packets : 32768 = Not Worked.
  • pfring buffers Check = how can i Check pfring buffer value ?

in addtion.
i tried that for restart service and CPU is operating abnormally until now.

  • Current CPU Usage: 10% ~ 20%
  • normaly CPU Usage: 40% ~ 70%

And I keep getting the following message in suricata.log. Does it problem?

  • Flow emergency mode over, back to normal… unsetting FLOW_EMERGENCY bit (ts.tv_sec: 1610102681, ts.tv_usec:866895) flow_spare_q status(): 224% flows at the queue.

In addition, as a result of checking the status.log, it was confirmed that most of the packets were dropped. (kernel_packets <kernel_drops)
-capture.kernel_packets
-capture.kernel_drops

In other normal suricata servers, drops were confirmed to be 0.
I Think The performance of the server is sufficient as follows, but it is presumed that suricata is not accepting packets due to the configuration of suricata.yaml.
-CPU: 64 Core
-MEM: 256 G
-NIC: 10 G

can i get suggest the suricata.yaml settings?

I would suggest following the docs on suricata side for perf optimization and tuning
https://suricata.readthedocs.io/en/suricata-6.0.1/performance/tuning-considerations.html

and
https://suricata.readthedocs.io/en/suricata-6.0.1/performance/high-performance-config.html

however there are some specific NIC/pfring settings that you should consider looking into as well.
I personally have not used pfring for a bit but when you load the pfring module you can specify buffer sizes.

It seems you have flow starvation so increasing the flow memcap would only help so far - if you can you should look into the traffic mirror and confirm things are as expected there. (sometimes there is un intended traffic in being mirrored etc …)