High packet drop rate in dpdk runmode

Hi, when I run suricata in dpdk runmode and replay a pcap file to this dpdk port, suricata gets a high pkts drop rate.
But when I switch to af-packets mode, it seems everything right.

myOS: Ubuntu20
memory: 128GB
Dpdk version: 21.11.1,
NIC: Intel 10-Gigabit X540-AT2

suricata.yaml:

dpdk:
  eal-params:
    proc-type: primary

  # DPDK capture support
  # RX queues (and TX queues in IPS mode) are assigned to cores in 1:1 ratio
  interfaces:
    - interface: 0000:5e:00.0 # PCIe address of the NIC port
      # Threading: possible values are either "auto" or number of threads
      # - auto takes all cores
      # in IPS mode it is required to specify the number of cores and the numbers on both interfaces must match
      threads: auto
      promisc: true # promiscuous mode - capture all packets
      multicast: true # enables also detection on multicast packets
      checksum-checks: true # if Suricata should validate checksums
      checksum-checks-offload: true # if possible offload checksum validation to the NIC (saves Suricata resources)
      mtu: 1500 # Set MTU of the device in bytes

      # To approximately calculate required amount of space (in bytes) for interface's mempool: mempool-size * mtu
      # Make sure you have enough allocated hugepages.
      # The optimum size for the packet memory pool (in terms of memory usage) is power of two minus one: n = (2^q - 1)
      mempool-size: 65535 # The number of elements in the mbuf pool

      # Mempool cache size must be lower or equal to:
      #     - RTE_MEMPOOL_CACHE_MAX_SIZE (by default 512) and
      #     - "mempool-size / 1.5"
      # It is advised to choose cache_size to have "mempool-size modulo cache_size == 0".
      # If this is not the case, some elements will always stay in the pool and will never be used.
      # The cache can be disabled if the cache_size argument is set to 0, can be useful to avoid losing objects in cache
      # If the value is empty or set to "auto", Suricata will attempt to set cache size of the mempool to a value
      # that matches the previously mentioned recommendations
      mempool-cache-size: 257
      rx-descriptors: 1024
      tx-descriptors: 1024
      #
      # IPS mode for Suricata works in 3 modes - none, tap, ips
      # - none: IDS mode only - disables IPS functionality (does not further forward packets)
      # - tap: forwards all packets and generates alerts (omits DROP action) This is not DPDK TAP
      # - ips: the same as tap mode but it also drops packets that are flagged by rules to be dropped
      copy-mode: none
      copy-iface: none # or PCIe address of the second interface

    - interface: default
      threads: auto
      promisc: true
      multicast: true
      checksum-checks: true
      checksum-checks-offload: true
      mtu: 1500
      mempool-size: 65535
      mempool-cache-size: 257
      rx-descriptors: 1024
      tx-descriptors: 1024
      copy-mode: none
      copy-iface: none

Huge page info:

grep Huge /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:   16384
HugePages_Free:    16383
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        33554432 kB

dpdk runmode result:

[1467] 7/9/2022 -- 06:53:51 - (suricata.c:1146) <Notice> (LogVersion) -- This is Suricata version 7.0.0-dev running in SYSTEM mode
[1467] 7/9/2022 -- 06:53:51 - (util-classification-config.c:138) <Warning> (SCClassConfInitContextAndLocalResources) -- [ERRCODE: SC_ERR_FOPEN(44)] - could not open: "/usr/local/etc/suricata/classification.config": No such file or directory
[1467] 7/9/2022 -- 06:53:51 - (util-classification-config.c:538) <Error> (SCClassConfLoadClassficationConfigFile) -- [ERRCODE: SC_ERR_OPENING_FILE(40)] - please check the "classification-file" option in your suricata.yaml file
[1467] 7/9/2022 -- 06:53:51 - (util-reference-config.c:129) <Error> (SCRConfInitContextAndLocalResources) -- [ERRCODE: SC_ERR_FOPEN(44)] - Error opening file: "/usr/local/etc/suricata/reference.config": No such file or directory
[1467] 7/9/2022 -- 06:53:51 - (util-reference-config.c:505) <Error> (SCRConfLoadReferenceConfigFile) -- [ERRCODE: SC_ERR_OPENING_FILE(40)] - please check the "reference-config-file" option in your suricata.yaml file
[1467] 7/9/2022 -- 06:53:51 - (detect-engine-loader.c:239) <Warning> (ProcessSigFiles) -- [ERRCODE: SC_ERR_NO_RULES(42)] - No rule files match the pattern /usr/local/var/lib/suricata/rules/suricata.rules
[1467] 7/9/2022 -- 06:53:51 - (detect-engine-loader.c:354) <Warning> (SigLoadSignatures) -- [ERRCODE: SC_ERR_NO_RULES_LOADED(43)] - 1 rule files specified, but no rules were loaded!
[1467] 7/9/2022 -- 06:53:51 - (util-threshold-config.c:257) <Warning> (SCThresholdConfInitContext) -- [ERRCODE: SC_ERR_FOPEN(44)] - Error opening file: "/usr/local/etc/suricata//threshold.config": No such file or directory
EAL: No available 1048576 kB hugepages reported
EAL: DPDK is running on a NUMA system, but is compiled without NUMA support.
EAL: This will have adverse consequences for performance and usability.
EAL: Please use --legacy-mem option, or recompile with NUMA support.
TELEMETRY: No legacy callbacks, legacy socket not created
[1467] 7/9/2022 -- 06:53:52 - (unix-manager.c:144) <Error> (UnixNew) -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - Cannot create socket directory /usr/local/var/run/suricata/: No such file or directory
[1467] 7/9/2022 -- 06:53:52 - (unix-manager.c:1050) <Warning> (UnixManagerInit) -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - Unable to create unix command socket
[1467] 7/9/2022 -- 06:53:52 - (tm-threads.c:1927) <Notice> (TmThreadWaitOnThreadInit) -- Threads created -> W: 12 FM: 1 FR: 1   Engine started.
^C[1467] 7/9/2022 -- 06:54:11 - (suricata.c:2774) <Notice> (SuricataMainLoop) -- Signal Received.  Stopping engine.
[1467] 7/9/2022 -- 06:54:13 - (util-device.c:355) <Notice> (LiveDeviceListClean) -- Stats for '0000:5e:00.0':  pkts: 314059, drop: 158818 (50.57%), invalid chksum: 0

stats.log:

Date: 9/7/2022 -- 01:25:26 (uptime: 0d, 00h 00m 16s)
------------------------------------------------------------------------------------
Counter                                       | TM Name                   | Value
------------------------------------------------------------------------------------
capture.packets                               | Total                     | 314051
capture.rx_errors                             | Total                     | 211896
capture.dpdk.imissed                          | Total                     | 138397
capture.dpdk.ierrors                          | Total                     | 73499
decoder.pkts                                  | Total                     | 102155
decoder.bytes                                 | Total                     | 108240640
decoder.invalid                               | Total                     | 1
decoder.ipv4                                  | Total                     | 102129
decoder.ipv6                                  | Total                     | 23
decoder.ethernet                              | Total                     | 102155
decoder.tcp                                   | Total                     | 102083
decoder.udp                                   | Total                     | 50
decoder.icmpv6                                | Total                     | 12
decoder.avg_pkt_size                          | Total                     | 1059
decoder.max_pkt_size                          | Total                     | 1514
flow.total                                    | Total                     | 106
flow.tcp                                      | Total                     | 89
flow.udp                                      | Total                     | 13
flow.icmpv6                                   | Total                     | 4
flow.wrk.spare_sync_avg                       | Total                     | 100
flow.wrk.spare_sync                           | Total                     | 12
decoder.event.ipv4.iplen_smaller_than_hlen    | Total                     | 1
decoder.event.ipv4.opt_pad_required           | Total                     | 6
decoder.event.ipv6.zero_len_padn              | Total                     | 8
flow.wrk.flows_evicted_needs_work             | Total                     | 52
flow.wrk.flows_evicted_pkt_inject             | Total                     | 62
flow.wrk.flows_injected                       | Total                     | 52
tcp.sessions                                  | Total                     | 74
tcp.syn                                       | Total                     | 90
tcp.synack                                    | Total                     | 68
tcp.rst                                       | Total                     | 60
tcp.stream_depth_reached                      | Total                     | 2
tcp.reassembly_gap                            | Total                     | 10
app_layer.flow.http                           | Total                     | 49
app_layer.tx.http                             | Total                     | 98
app_layer.flow.tls                            | Total                     | 11
app_layer.error.tls.gap                       | Total                     | 1
app_layer.flow.dhcp                           | Total                     | 1
app_layer.tx.dhcp                             | Total                     | 3
app_layer.flow.failed_udp                     | Total                     | 12
flow.end.state.new                            | Total                     | 34
flow.end.state.established                    | Total                     | 6
flow.end.state.closed                         | Total                     | 66
flow.end.tcp_state.syn_sent                   | Total                     | 6
flow.end.tcp_state.established                | Total                     | 2
flow.end.tcp_state.last_ack                   | Total                     | 2
flow.end.tcp_state.closed                     | Total                     | 64
flow.end.tcp_liberal                          | Total                     | 5
flow.mgr.full_hash_pass                       | Total                     | 1
flow.mgr.rows_per_sec                         | Total                     | 6553
flow.spare                                    | Total                     | 10100
flow.mgr.rows_maxlen                          | Total                     | 2
flow.mgr.flows_checked                        | Total                     | 98
flow.mgr.flows_notimeout                      | Total                     | 98
memcap_pressure                               | Total                     | 10
memcap_pressure_max                           | Total                     | 10
flow.recycler.recycled                        | Total                     | 54
flow.recycler.queue_avg                       | Total                     | 3
flow.recycler.queue_max                       | Total                     | 54
tcp.memuse                                    | Total                     | 7274496
tcp.reassembly_memuse                         | Total                     | 1376256
http.memuse                                   | Total                     | 34070
flow.memuse                                   | Total                     | 7810304
------------------------------------------------------------------------------------

af-packets runmode:
[1983] 7/9/2022 -- 07:33:57 - (util-device.c:355) <Notice> (LiveDeviceListClean) -- Stats for 'ens3f0': pkts: 240490, drop: 0 (0.00%), invalid chksum: 73488

Also, when I try it in another server with lower profile (NIC: 82576 Gigabit Network Connection, 8G, 4 core, dpdk:21.11), there is also a low drop rate.

logs:

[4578] 7/9/2022 -- 10:21:10 - (suricata.c:1147) <Notice> (LogVersion) -- This is Suricata version 7.0.0-dev running in SYSTEM mode
[4578] 7/9/2022 -- 10:21:10 - (util-classification-config.c:139) <Warning> (SCClassConfInitContextAndLocalResources) -- [ERRCODE: SC_ERR_FOPEN(44)] - could not open: "/usr/local/etc/suricata/classification.config": No such file or directory
[4578] 7/9/2022 -- 10:21:10 - (util-classification-config.c:539) <Error> (SCClassConfLoadClassficationConfigFile) -- [ERRCODE: SC_ERR_OPENING_FILE(40)] - please check the "classification-file" option in your suricata.yaml file
[4578] 7/9/2022 -- 10:21:10 - (util-reference-config.c:130) <Error> (SCRConfInitContextAndLocalResources) -- [ERRCODE: SC_ERR_FOPEN(44)] - Error opening file: "/usr/local/etc/suricata/reference.config": No such file or directory
[4578] 7/9/2022 -- 10:21:10 - (util-reference-config.c:506) <Error> (SCRConfLoadReferenceConfigFile) -- [ERRCODE: SC_ERR_OPENING_FILE(40)] - please check the "reference-config-file" option in your suricata.yaml file
[4578] 7/9/2022 -- 10:21:10 - (detect-engine-loader.c:239) <Warning> (ProcessSigFiles) -- [ERRCODE: SC_ERR_NO_RULES(42)] - No rule files match the pattern /usr/local/var/lib/suricata/rules/suricata.rules
[4578] 7/9/2022 -- 10:21:10 - (util-threshold-config.c:257) <Warning> (SCThresholdConfInitContext) -- [ERRCODE: SC_ERR_FOPEN(44)] - Error opening file: "/usr/local/etc/suricata//threshold.config": No such file or directory
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
[4578] 7/9/2022 -- 10:21:11 - (runmode-dpdk.c:921) <Warning> (DeviceInitPortConf) -- [ERRCODE: SC_WARN_DPDK_CONF(344)] - Interface 0000:02:00.0 modified RSS hash function based on hardware support, requested:0xa38c configured:0x8104
[4588] 7/9/2022 -- 10:21:11 - (log-pcap.c:1047) <Notice> (PcapLogInitRingBuffer) -- Ring buffer initialized with 2 files.
[4595] 7/9/2022 -- 10:21:11 - (log-pcap.c:1047) <Notice> (PcapLogInitRingBuffer) -- Ring buffer initialized with 2 files.
[4598] 7/9/2022 -- 10:21:11 - (log-pcap.c:1047) <Notice> (PcapLogInitRingBuffer) -- Ring buffer initialized with 2 files.
[4600] 7/9/2022 -- 10:21:12 - (log-pcap.c:1047) <Notice> (PcapLogInitRingBuffer) -- Ring buffer initialized with 3 files.
[4578] 7/9/2022 -- 10:21:12 - (unix-manager.c:146) <Error> (UnixNew) -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - Cannot create socket directory /usr/local/var/run/suricata/: No such file or directory
[4578] 7/9/2022 -- 10:21:12 - (unix-manager.c:1051) <Warning> (UnixManagerInit) -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - Unable to create unix command socket
[4578] 7/9/2022 -- 10:21:12 - (tm-threads.c:1927) <Notice> (TmThreadWaitOnThreadInit) -- Threads created -> W: 4 FM: 1 FR: 1   Engine started.
^C[4578] 7/9/2022 -- 10:21:40 - (suricata.c:2774) <Notice> (SuricataMainLoop) -- Signal Received.  Stopping engine.
[4578] 7/9/2022 -- 10:21:41 - (util-device.c:359) <Notice> (LiveDeviceListClean) -- Stats for '0000:02:00.0':  pkts: 240569, drop: 351 (0.15%), invalid chksum: 0

There must be something wrong. Please help me figure it out. Thanks.

Hi @superKT,

common mistake might be not setting cpu-affinity in suricata.yaml file.

I would suggest to:

  • within dpdk: configuration section to set threads to e.g. 4 (and not auto)
  • within set-cpu-affinity configuration section enable affinity and define on what cores Suricata workers should run on (e.g. 2,3,4,5)

Try it out and if it does not help, attach output of lscpu and the whole suricata.yaml file.

This behavior is easy to spot with e.g. htop because DPDK application polls for packets all the time and as a result keeps all cores busy all the time (even with no packets). If you only see CPU load on the first core then the configuration is incorrect.

Hi, Lukas,
I’ve tried your solution, and packet loss rate is reduced, but still has about 20% drop rate.

output log:

[18515] 8/9/2022 -- 02:35:52 - (suricata.c:1146) <Notice> (LogVersion) -- This is Suricata version 7.0.0-dev running in SYSTEM mode
[18515] 8/9/2022 -- 02:35:52 - (util-classification-config.c:138) <Warning> (SCClassConfInitContextAndLocalResources) -- [ERRCODE: SC_ERR_FOPEN(44)] - could not open: "/usr/local/etc/suricata/classification.config": No such file or directory
[18515] 8/9/2022 -- 02:35:52 - (util-classification-config.c:538) <Error> (SCClassConfLoadClassficationConfigFile) -- [ERRCODE: SC_ERR_OPENING_FILE(40)] - please check the "classification-file" option in your suricata.yaml file
[18515] 8/9/2022 -- 02:35:52 - (util-reference-config.c:129) <Error> (SCRConfInitContextAndLocalResources) -- [ERRCODE: SC_ERR_FOPEN(44)] - Error opening file: "/usr/local/etc/suricata/reference.config": No such file or directory
[18515] 8/9/2022 -- 02:35:52 - (util-reference-config.c:505) <Error> (SCRConfLoadReferenceConfigFile) -- [ERRCODE: SC_ERR_OPENING_FILE(40)] - please check the "reference-config-file" option in your suricata.yaml file
[18515] 8/9/2022 -- 02:35:52 - (detect-engine-loader.c:239) <Warning> (ProcessSigFiles) -- [ERRCODE: SC_ERR_NO_RULES(42)] - No rule files match the pattern /usr/local/var/lib/suricata/rules/suricata.rules
[18515] 8/9/2022 -- 02:35:52 - (detect-engine-loader.c:354) <Warning> (SigLoadSignatures) -- [ERRCODE: SC_ERR_NO_RULES_LOADED(43)] - 1 rule files specified, but no rules were loaded!
[18515] 8/9/2022 -- 02:35:52 - (util-threshold-config.c:257) <Warning> (SCThresholdConfInitContext) -- [ERRCODE: SC_ERR_FOPEN(44)] - Error opening file: "/usr/local/etc/suricata//threshold.config": No such file or directory
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
[18531] 8/9/2022 -- 02:35:52 - (source-dpdk.c:477) <Warning> (ReceiveDPDKThreadInit) -- [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 0 but thread on NUMA 1. Decreased performance expected
[18533] 8/9/2022 -- 02:35:52 - (source-dpdk.c:477) <Warning> (ReceiveDPDKThreadInit) -- [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 0 but thread on NUMA 1. Decreased performance expected
[18535] 8/9/2022 -- 02:35:53 - (source-dpdk.c:477) <Warning> (ReceiveDPDKThreadInit) -- [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 0 but thread on NUMA 1. Decreased performance expected
[18537] 8/9/2022 -- 02:35:53 - (source-dpdk.c:477) <Warning> (ReceiveDPDKThreadInit) -- [ERRCODE: SC_WARN_DPDK_CONF(344)] - NIC on NUMA 0 but thread on NUMA 1. Decreased performance expected
[18515] 8/9/2022 -- 02:35:53 - (unix-manager.c:144) <Error> (UnixNew) -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - Cannot create socket directory /usr/local/var/run/suricata/: No such file or directory
[18515] 8/9/2022 -- 02:35:53 - (unix-manager.c:1050) <Warning> (UnixManagerInit) -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - Unable to create unix command socket
[18515] 8/9/2022 -- 02:35:53 - (tm-threads.c:1927) <Notice> (TmThreadWaitOnThreadInit) -- Threads created -> W: 8 FM: 1 FR: 1   Engine started.
^C[18515] 8/9/2022 -- 02:36:45 - (suricata.c:2774) <Notice> (SuricataMainLoop) -- Signal Received.  Stopping engine.
[18515] 8/9/2022 -- 02:36:47 - (util-device.c:355) <Notice> (LiveDeviceListClean) -- Stats for '0000:5e:00.0':  pkts: 314060, drop: 73499 (23.40%), invalid chksum: 0

suricata.yaml (76.3 KB)

when I run suricata, the cores are busy.

lscpu:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              1
Core(s) per socket:              6
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Bronze 3104 CPU @ 1.70GHz
Stepping:                        4
CPU MHz:                         800.526
CPU max MHz:                     1700.0000
CPU min MHz:                     800.0000
BogoMIPS:                        3400.00
Virtualization:                  VT-x
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        12 MiB
L3 cache:                        16.5 MiB
NUMA node0 CPU(s):               0,2,4,6,8,10
NUMA node1 CPU(s):               1,3,5,7,9,11
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB f                                 illing
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT disabled
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
                                 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtsc                                 p lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nons                                 top_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est                                  tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe p                                 opcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowpref                                 etch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb                                  stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust 
                                 bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq 
                                 rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsav                                 eopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_l                                 ocal dtherm arat pln pts pku ospke md_clear flush_l1d arch_capabilities

Please help me figure it out. Thanks.

As you can see in the Suricata warnings, some of your worker cores are on a different NUMA nodes as your NIC. In the lscpu command output you see that cores of individual NUMA nodes. I was not aware you have 2 NUMA node server in my previous comment. I would suggest to consolidate whole Suricata deployment on one NUMA node where your NIC is connected to. So e.g. if your NIC is connected NUMA0 you use cores of NUMA0 (even cores). Assign a separate core for your management thread.

Would AF-Packet with the same CPU affinity settings receive all packets for you?

It is also interesting that your screenshot displays load at ~92% and not 99% but that can be just a minor thing.

Otherwise, could you try to maybe double number of rx/tx descriptors.

Lukas

I think I found the cause of the problem. I have replayed the same pcap before, which would cause packet loss. The number of dropped packets is always same.

pkts: 314107, drop: 73499 (23.40%), invalid chksum: 0

When I change many other pcaps, the drop rate is almost 0%.

However, AF-Packet can also receive all packets with no loss.

9/9/2022 -- 06:53:44 - (util-device.c:355) <Notice> (LiveDeviceListClean) -- Stats for 'ens3f0':  pkts: 240962, drop: 0 (0.00%), invalid chksum: 73487

Also, I replayed this suspicious pcap on another server running suricata+dpdk (mentioned before), there almost no packet drop occurs.

That makes me confused. Is there something wrong with my configuration?

That’s very weird… the logic for parsing packets should be the same for AF-Packet and DPDK.
Are you using X540 NICs on both systems?
The thing I have just noticed is that AFP reports almost the same number of invalid checksums as the DPDK drops. Maybe that’s the difference between the two modes. That would also explain why you see the same amount of dropped packets with DPDK mode.

Just for the test you could theoretically disable both checksum validation options in the dpdk section of suricata.yaml config file.

Hey @superKT,

just reaching out if you were able to diagnose the problem or verify my suggestion regarding the count of invalid checksums.
I was trying to reproduce the issue by replaying a packet with an invalid TCP checksum to Suricata but DPDK counts that packet as received and never classifies it as an invalid checksum (whereas AFPacket does) - so I’ll have a look at that. But I failed to increase the drop counter with a packet containing an invalid checksum. I tried it on X710 and MLX5 NICs.
Alternatively, would you be able to share the PCAP with me?
Thanks!

Hi Lukas,
Sorry for replying you late because I can’t use my server lately. I’ll try your advice as soon as I can have access to my server.

The NICs I used are X540 and 82599. And the server with X540 has a packet drop problem.
I have sent the Pcap to your email.
Thanks!

DPDK has not been counting packets with invalid checksum. I’ve tried to fix the issue with the PR 7893. You can try it out and see if it helps.

Here are my results on:

  • Intel X710 (i40e) - Stats for '0000:19:00.0': pkts: 239545, drop: 0 (0.00%), invalid chksum: 73096
  • MLX5 - Stats for '0000:3b:00.1': pkts: 239543, drop: 0 (0.00%), invalid chksum: 73086

Note that I was using scapy to send the packets and even though I set MTU to 9K it still complained about too long packets. The point is I may have not sent all packets from the PCAP but we are close to the number of packet/invalid checksums reported by AF-Packet.

Additionally, I’ve not been inspecting how big the packets in your PCAP are but I’ve noticed that your MTU in the suricata.yaml is only 1500 bytes - you should increase that to the 9K limit to make sure you are able to receive all incoming packets.

Thank you!

Hi Lukas,
I tried the version suricata-bug-5553-invalid-chksums-v3 and replay the PCAP I tested before.

The output on Intel X540 is:

Stats for '0000:5e:00.0':  pkts: 314162, drop: 73499 (23.40%), invalid chksum: 73517

When I disabled the checksum options in suricata.yaml, the output is: Stats for '0000:5e:00.0': pkts: 314092, drop: 73499 (23.40%), invalid chksum: 0

Also, I tried another another pcap on X540, the output is: Stats for '0000:5e:00.0': pkts: 4766867, drop: 26268 (0.55%), invalid chksum: 1321

I think there should be something wrong because the replay speed is not very high and dpdk should not drop any packet.

Thanks!

Jerry thank you for your report.

That branch you have been trying contains also extra logging of NIC stats. You can enable that with -vvv switch when starting Suricata.
Can you please post that as well?

Indeed if the number of drops are consistent then I believe the problem lies somewhere around thebcard and capture interface.

With this in mind, have you tried to increase the MTU size from the default 1500B?
Thanks

Sorry for reply late again!

I used -vvv option and increased the MTU to 9000. The output is too long, so I put it in the file.
Please help me check the output. Next I will try to replace a NIC and see if I still have the same problem.

Thanks!

output.log (49.5 KB)

Thanks for posting the log. The output shows the number of receive errors matching the drop count:

Port 0 - rx_missed_errors: 204
Port 0 - rx_errors: 73499
.
.
.
Port 0 - rx_l3_l4_xsum_error: 73499
Port 0 - rx_priority0_dropped: 204

So most of these are checksum errors. Using only the dropped values shows a 204 / 314105 drop rate.

This doesn’t show the checksum errors reported in rx_l3_l4_xsum_error

  pkts: 314105, drop: 73703 (23.46%), invalid chksum: 0