Suricata high capture.kernel_drops count. I use the PF_RING zc mode

The cores are peaking at 100% so this does look bad. Can you also run perf top -p $(pidof suricata)? and how many rules are enabled?

I would also still give AF_PACKET a try which is the best supported mode IMHO

The cores are peaking at about 80%~92%. When I run suricata in autofp mode, The cores are peaking at about 60%-80%, but the capture.kernel_packets: capture.kernel_drops is about 1:3. So I think it’s not the CPU’s problem.


According to your advice. I ran suricata in af_packet mode. suricata --af-packet -c /etc/suricata/suricata.yaml -vvvv

af-packet:
  - interface: eth2
    # Number of receive threads. "auto" uses the number of cores
    #threads: auto
    # Default clusterid. AF_PACKET will load balance packets based on flow.
    cluster-id: 99
    # Default AF_PACKET cluster type. AF_PACKET can load balance per flow or per hash.
    # This is only supported for Linux kernel > 3.1
    # possible value are:
    #  * cluster_flow: all packets of a given flow are sent to the same socket
    #  * cluster_cpu: all packets treated in kernel by a CPU are sent to the same socket
    #  * cluster_qm: all packets linked by network card to a RSS queue are sent to the same
    #  socket. Requires at least Linux 3.14.
    #  * cluster_ebpf: eBPF file load balancing. See doc/userguide/capture-hardware/ebpf-xdp.rst for
    #  more info.
    # Recommended modes are cluster_flow on most boxes and cluster_cpu or cluster_qm on system
    # with capture card using RSS (requires cpu affinity tuning and system IRQ tuning)
    cluster-type: cluster_flow
    # In some fragmentation cases, the hash can not be computed. If "defrag" is set
    # to yes, the kernel will do the needed defragmentation before sending the packets.
    defrag: yes


31/8/2022 -- 18:35:32 - <Perf> - AutoFP - Total flow handler queues - 48
31/8/2022 -- 18:35:32 - <Perf> - (RX#43-eth2) Kernel: Packets 2795988, dropped 2491963
31/8/2022 -- 18:35:32 - <Perf> - AutoFP - Total flow handler queues - 48
31/8/2022 -- 18:35:32 - <Perf> - (RX#44-eth2) Kernel: Packets 2761698, dropped 2444301
31/8/2022 -- 18:35:32 - <Perf> - AutoFP - Total flow handler queues - 48
31/8/2022 -- 18:35:32 - <Perf> - (RX#45-eth2) Kernel: Packets 2980594, dropped 2662848
31/8/2022 -- 18:35:32 - <Perf> - AutoFP - Total flow handler queues - 48
31/8/2022 -- 18:35:32 - <Perf> - (RX#46-eth2) Kernel: Packets 2804426, dropped 2488253
31/8/2022 -- 18:35:32 - <Perf> - AutoFP - Total flow handler queues - 48
31/8/2022 -- 18:35:32 - <Perf> - (RX#47-eth2) Kernel: Packets 2941815, dropped 2621890
31/8/2022 -- 18:35:32 - <Perf> - AutoFP - Total flow handler queues - 48
31/8/2022 -- 18:35:32 - <Perf> - (RX#48-eth2) Kernel: Packets 3613353, dropped 3290188
31/8/2022 -- 18:35:32 - <Perf> - AutoFP - Total flow handler queues - 48
31/8/2022 -- 18:35:38 - <Info> - Alerts: 73
31/8/2022 -- 18:35:38 - <Perf> - ippair memory usage: 414144 bytes, maximum: 16777216
^C31/8/2022 -- 18:35:39 - <Perf> - Done dumping profiling data.
31/8/2022 -- 18:35:39 - <Perf> - host memory usage: 39814400 bytes, maximum: 21474836480
31/8/2022 -- 18:35:39 - <Perf> - Dumping profiling data for 1 rules.
31/8/2022 -- 18:35:39 - <Perf> - Done dumping profiling data.
31/8/2022 -- 18:35:39 - <Perf> - Done dumping keyword profiling data.
31/8/2022 -- 18:35:39 - <Perf> - Done dumping rulegroup profiling data.
31/8/2022 -- 18:35:39 - <Perf> - Done dumping prefilter profiling data.
31/8/2022 -- 18:35:39 - <Info> - cleaning up signature grouping structure... complete
31/8/2022 -- 18:35:39 - <Notice> - Stats for 'eth2':  pkts: 135537041, drop: 120452507 (88.87%), invalid chksum: 0
31/8/2022 -- 18:35:39 - <Perf> - Cleaning up Hyperscan global scratch
31/8/2022 -- 18:35:39 - <Perf> - Clearing Hyperscan database cache

I did overlook one important thing:

  Profiling enabled:                       yes

Don’t run with profiling enabled unless you want to debug something code specific. Profiling is very resource intensive. Rebuild Suricata without this enabled and try again.

And with the new run again perf top to see if the handle_mm_fault is still there.

Your advice is very helpful. But it still has a lot of dropped packets. It will drop lots of packets about every ten minutes. That really confuses me.



The CPU utilization is very low just now. But when I closed the file-store option, It became in this situation and didn’t drop packets.


Filestore is also quite resource intensive, but the perf top also indicates that maybe pf ring needs to be improved. The overhead of ring_is_not_empty would be worth a look.

Thanks for your reply. Could you tell me more about looking ring_is_not_empty in detail? The CPU utilization is so low. I really want to fix it great.

This sounds pf_ring specific. What NIC are you using?

This is my NIC that I am using.

[root@sec-audit-lljx-027093 ~]#  lspci | grep -i eth
19:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
19:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
5e:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
5e:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
[root@sec-audit-lljx-027093 ~]# ifconfig eth2
eth2: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 1500
        inet6 fe80::b696:91ff:feb2:a9e8  prefixlen 64  scopeid 0x20<link>
        ether b4:96:91:b2:a9:e8  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 374546577  overruns 0  frame 0
        TX packets 4  bytes 980 (980.0 B)
        TX errors 4  dropped 0 overruns 0  carrier 0  collisions 0

Well there are two NICs, the Mellanox and the Intel one, so which one is used in that case :)?

The NIC is the Mellanox. When the file-store option is closed, the cores are peaking at 40%-50%, and everything is OK. When the file-store option is opening, the cores are peaking at 20%-30%, and they begin to drop a lot of packets. That’s very strange. I don’t know why that happened.

 - file-store:
      version: 2
      enabled: yes

      # Set the directory for the filestore. Relative pathnames
      # are contained within the "default-log-dir".
      dir: filestore

      # Write out a fileinfo record for each occurrence of a file.
      # Disabled by default as each occurrence is already logged
      # as a fileinfo record to the main eve-log.
      write-fileinfo: yes

      # Force storing of all files. Default: no.
      #force-filestore: yes

      # Override the global stream-depth for sessions in which we want
      # to perform file extraction. Set to 0 for unlimited; otherwise,
      # must be greater than the global stream-depth value to be used.
      stream-depth: 200mb

      # Uncomment the following variable to define how many files can
      # remain open for filestore by Suricata. Default value is 0 which
      # means files get closed after each write to the file.
      max-open-files: 10000

      # Force logging of checksums: available hash functions are md5,
      # sha1 and sha256. Note that SHA256 is automatically forced by
      # the use of this output module as it uses the SHA256 as the
      # file naming scheme.
      #force-hash: [sha1, md5]
      # NOTE: X-Forwarded configuration is ignored if write-fileinfo is disabled
      # HTTP X-Forwarded-For support by adding an extra field or overwriting
      # the source or destination IP address (depending on flow direction)
      # with the one reported in the X-Forwarded-For HTTP header. This is
      # helpful when reviewing alerts for traffic that is being reverse
      # or forward proxied.
      xff:
        enabled: no
        # Two operation modes are available, "extra-data" and "overwrite".
        mode: extra-data
        # Two proxy deployments are supported, "reverse" and "forward". In
        # a "reverse" deployment the IP address used is the last one, in a
        # "forward" deployment the first IP address is used.
        deployment: reverse
        # Header name where the actual IP address will be reported. If more
        # than one IP address is present, the last IP address will be the
        # one taken into consideration.
        header: X-Forwarded-For

stream:
  memcap: 20gb
  checksum-validation: yes      # reject incorrect csums
  inline: no                  # auto will use inline mode in IPS mode, yes or no set it statically
  reassembly:
    memcap: 20gb
    depth: 2kb                 # reassemble 1mb into a stream
    toserver-chunk-size:  25600
    toclient-chunk-size: 25600
    randomize-chunk-size: no
    #randomize-chunk-range: 10
    #raw: yes
    #segment-prealloc: 20000
    #check-overlap-different-data: true

Where do you log the files and what hardware is this?
Can you check the I/O stats as well?
This could be an I/O issue, especially when the CPU is more idle during that

I just log a few files, and I/O stats are always at a low state.

[root@sec-audit-lljx-027093 nvme0n1]#  smartctl --all /dev/nvme0n1 
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       P5516DS0160T00
Serial Number:                      SH211903989
Firmware Version:                   224005A0
PCI Vendor/Subsystem ID:            0x1c5f
IEEE OUI Identifier:                0x00e0cf
Total NVM Capacity:                 1,600,321,314,816 [1.60 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,600,321,314,816 [1.60 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            38b19e 734c0f9501
Local Time is:                      Tue Sep 20 10:42:58 2022 CST
Firmware Updates (0x07):            3 Slots, Slot 1 R/O
Optional Admin Commands (0x000e):   Format Frmw_DL NS_Mngmt
Optional NVM Commands (0x0014):     DS_Mngmt Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    14.00W       -        -    0  0  0  0        0       0
 1 +    13.00W       -        -    1  1  1  1        0       0
 2 +    12.00W       -        -    2  2  2  2        0       0
 3 +    11.00W       -        -    3  3  3  3        0       0
 4 +    10.00W       -        -    4  4  4  4        0       0
 5 +     9.00W       -        -    5  5  5  5        0       0
 6 +     8.00W       -        -    6  6  6  6        0       0
 7 +     7.00W       -        -    7  7  7  7        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         0
 2 -     512       0         2
 3 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        37 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    0%
Data Units Read:                    2,007,404 [1.02 TB]
Data Units Written:                 1,038,711 [531 GB]
Host Read Commands:                 7,861,706
Host Write Commands:                4,069,800
Controller Busy Time:               12
Power Cycles:                       10
Power On Hours:                     10,134
Unsafe Shutdowns:                   2
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               32 Celsius
Temperature Sensor 2:               27 Celsius

Error Information (NVMe Log 0x01, max 63 entries)
No Errors Logged


Can you post your whole suricata.yaml (remove confidential settings)?
What rulesets do you use and are there also custom rules?
Also the whole suricata.log would help to see if there is something off.
33MB doesn’t sound like a lot of files to be stored.

I have only one custom rule:

alert http any any <> any any (msg:"FILE info all"; http.content_type; content:"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"; filestore; sid:130003; rev:1;)

Following is my configuration file.
suricata.yaml (74.2 KB)

suricata.log

And suricata.rules are ET rules?

What happens if you just remove the custom rule?

And do you see those syscall errors on all interfaces? I’m not familiar with PF_RING but I would investigate those errors as well.

The file suricata.rules is empty.
Those errors don’t seem to matter. So I am confused.

So you have just one single rule, the custom rule?
How much traffic is currently forwarded?
You made sure that the current build doesn’t use profiling anymore?
You could try AF_PACKET again to rule out that it’s something with PF_RING

yes.
about 5G/s traffic.
yes, I am not using profiling anymore.
ok, but AF_PACKET can’t deal with so much traffic.

AF_PACKET can deal with that amount of traffic, you have enough cores for that. I’ve seen 20Gbit/s deployments with less cores and running AF_PACKET.

Especially when you just run one rule the load is even less.