Hig capture.kernel_drops more than 70%

Hi, First of all, thank you for this forum.

We are having a problem that we do not know how to solve, In our infrastructure we have 6 probes with Surica, 5 of them work perfect but 1 following the same configuration as the rest, with more resources and less traffic has a kernel drop packets of 70%.

Counter | TM Name | Value

capture.kernel_packets | Total | 42964913
capture.kernel_drops | Total | 26761938
decoder.pkts | Total | 11388467
decoder.bytes | Total | 4080843211
decoder.ipv4 | Total | 11366363
decoder.ipv6 | Total | 5995
decoder.ethernet | Total | 11388467
decoder.tcp | Total | 3767605
decoder.udp | Total | 7567686
decoder.icmpv4 | Total | 36429
decoder.icmpv6 | Total | 66
decoder.avg_pkt_size | Total | 358
decoder.max_pkt_size | Total | 1514
flow.tcp | Total | 381858
flow.udp | Total | 4465888
flow.icmpv4 | Total | 26837
flow.icmpv6 | Total | 4
defrag.ipv4.fragments | Total | 364
defrag.ipv4.reassembled | Total | 176
decoder.ipv4.opt_pad_required | Total | 148
decoder.ipv6.zero_len_padn | Total | 66
tcp.sessions | Total | 44879
tcp.syn | Total | 44881
tcp.synack | Total | 45095
tcp.rst | Total | 35880
tcp.pkt_on_wrong_thread | Total | 764957
tcp.stream_depth_reached | Total | 1
tcp.reassembly_gap | Total | 77
tcp.overlap | Total | 31147
detect.alert | Total | 1750
app_layer.flow.http | Total | 8
app_layer.tx.http | Total | 8
app_layer.flow.tls | Total | 6
app_layer.flow.smb | Total | 127
app_layer.tx.smb | Total | 4319
app_layer.flow.dcerpc_tcp | Total | 413
app_layer.flow.dns_tcp | Total | 39
app_layer.tx.dns_tcp | Total | 82
app_layer.flow.enip | Total | 46
app_layer.flow.ntp | Total | 320
app_layer.flow.krb5_tcp | Total | 907
app_layer.tx.krb5_tcp | Total | 895
app_layer.flow.dhcp | Total | 40
app_layer.flow.failed_tcp | Total | 1156
app_layer.flow.dcerpc_udp | Total | 36
app_layer.flow.dns_udp | Total | 2308628
app_layer.tx.dns_udp | Total | 3294868
app_layer.tx.enip | Total | 66
app_layer.tx.ntp | Total | 251
app_layer.flow.krb5_udp | Total | 29
app_layer.tx.krb5_udp | Total | 2
app_layer.tx.dhcp | Total | 200
app_layer.flow.failed_udp | Total | 2156789
flow_mgr.closed_pruned | Total | 2419
flow_mgr.new_pruned | Total | 4669636
flow_mgr.est_pruned | Total | 202532
flow.spare | Total | 1048576
flow.tcp_reuse | Total | 540
flow_mgr.flows_checked | Total | 867
flow_mgr.flows_timeout | Total | 867
flow_mgr.flows_removed | Total | 867
flow_mgr.rows_checked | Total | 1048576
flow_mgr.rows_skipped | Total | 1044811
flow_mgr.rows_empty | Total | 2898
flow_mgr.rows_maxlen | Total | 1
tcp.memuse | Total | 392000000
tcp.reassembly_memuse | Total | 2752512
flow.memuse | Total | 387042112

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 4830 @ 2.13GHz
Stepping: 2
CPU MHz: 2128.015
CPU max MHz: 2129.0000
CPU min MHz: 1064.0000
BogoMIPS: 4255.98
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt aes lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d

125GB Ram, and the normal traffic 900 Mb/s

  description: Ethernet interface
   product: 82599ES 10-Gigabit SFI/SFP+ Network Connection
   vendor: Intel Corporation
   physical id: 0
   bus info: pci@0000:95:00.0
   logical name: eth12
   version: 01
   serial: 90:e2:ba:a0:78:20
   size: 10Gbit/s
   capacity: 10Gbit/s
   width: 64 bits
   clock: 33MHz
   capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical fibre 10000bt-fd
   configuration: autonegotiation=off broadcast=yes driver=ixgbe driverversion=5.1.0-k duplex=full firmware=0x8000081f latency=0 link=yes multicast=yes port=fibre promiscuous=yes speed=10Gbit/s
   resources: irq:44 memory:c0c00000-c0cfffff ioport:9fc0(size=32) memory:c0afc000-c0afffff memory:c0a00000-c0a7ffff memory:c0100000-c01fffff memory:c0200000-c02fffff

suricata.yaml (58.0 KB)

afpacket_f.yaml (176 Bytes)

Any advice on something that could change? Having a 10% drop would be enough, we do not understand why this probe loses so many packages

Regards,

There could be many reasons why; some of them are

  • Traffic rate imbalance. What are the traffic rates seen by each probe? Is it equally distributed?
  • Traffic disparities. Is the sensor that’s dropping packets seeing large (elephant) high-speed flows?
  • Hardware differences. Are each of the sensors identical with respect to hardware configuration?
  • OS. Same question but I’m assuming they are all running the same OS/patch levels
  • Suricata version. Which Suricata version are you using?
  • Suricata rulesets. I’d expect that there is one ruleset loaded on each of the sensors.
  • Resource allocation. Are the CPU cores used by Suricata isolated? That is, protected so that Linux doesn’t schedule other jobs on them? See isolcpus for reference.

This list is by no means complete but is a starting point to begin characterization of what might be occurring.