Some of DPDK worker threads received 0 packets

Hello, I’m running suricata 7.0, with the command line “suricata --dpdk -vvv”, I found the message like “(W#17-0000…00.0) received packets 0” for some of worker threads.
Please see the full messages below:

7/12/2022 -- 00:24:00 - <Info> - time elapsed 153.609s
7/12/2022 -- 00:24:06 - <Perf> - 1740309 flows processed
7/12/2022 -- 00:24:07 - <Perf> - 1790877 flows processed
7/12/2022 -- 00:24:07 - <Perf> - Total RX stats of 0000:3c:00.0: packets 67146697 bytes: 34690689730 missed: 45505062 errors: 0 nombufs: 0
7/12/2022 -- 00:24:07 - <Perf> - (W#01-0000..00.0) received packets 5756428
7/12/2022 -- 00:24:07 - <Perf> - (W#02-0000..00.0) received packets 5675240
7/12/2022 -- 00:24:07 - <Perf> - (W#03-0000..00.0) received packets 5707733
7/12/2022 -- 00:24:07 - <Perf> - (W#04-0000..00.0) received packets 5649414
7/12/2022 -- 00:24:07 - <Perf> - (W#05-0000..00.0) received packets 5693064
7/12/2022 -- 00:24:07 - <Perf> - (W#06-0000..00.0) received packets 5613512
7/12/2022 -- 00:24:07 - <Perf> - (W#07-0000..00.0) received packets 6096739
7/12/2022 -- 00:24:07 - <Perf> - (W#08-0000..00.0) received packets 5711998
7/12/2022 -- 00:24:07 - <Perf> - (W#09-0000..00.0) received packets 2978760
7/12/2022 -- 00:24:07 - <Perf> - (W#10-0000..00.0) received packets 2451203
7/12/2022 -- 00:24:07 - <Perf> - (W#11-0000..00.0) received packets 2443522
7/12/2022 -- 00:24:07 - <Perf> - (W#12-0000..00.0) received packets 2911791
7/12/2022 -- 00:24:07 - <Perf> - (W#13-0000..00.0) received packets 2428270
7/12/2022 -- 00:24:07 - <Perf> - (W#14-0000..00.0) received packets 2826189
7/12/2022 -- 00:24:07 - <Perf> - (W#15-0000..00.0) received packets 2748860
> *7/12/2022 -- 00:24:07 - <Perf> - (W#16-0000..00.0) received packets 2446266*
> *7/12/2022 -- 00:24:07 - <Perf> - (W#17-0000..00.0) received packets 0*
> *7/12/2022 -- 00:24:07 - <Perf> - (W#18-0000..00.0) received packets 0*
> *7/12/2022 -- 00:24:07 - <Perf> - (W#19-0000..00.0) received packets 0*
> *7/12/2022 -- 00:24:07 - <Perf> - (W#20-0000..00.0) received packets 0*
> *7/12/2022 -- 00:24:07 - <Perf> - (W#21-0000..00.0) received packets 0*
> *7/12/2022 -- 00:24:07 - <Perf> - (W#22-0000..00.0) received packets 0*
> *7/12/2022 -- 00:24:07 - <Perf> - (W#23-0000..00.0) received packets 0*
> *7/12/2022 -- 00:24:07 - <Perf> - (W#24-0000..00.0) received packets 0*
7/12/2022 -- 00:24:07 - <Info> - Alerts: 0
7/12/2022 -- 00:24:08 - <Perf> - ippair memory usage: 414144 bytes, maximum: 16777216
7/12/2022 -- 00:24:08 - <Perf> - host memory usage: 398144 bytes, maximum: 33554432
7/12/2022 -- 00:24:08 - <Info> - cleaning up signature grouping structure... complete
7/12/2022 -- 00:24:08 - <Notice> - Stats for '0000:3c:00.0':  pkts: 112651759, drop: 45505062 (40.39%), invalid chksum: 0
7/12/2022 -- 00:24:08 - <Info> - Closing device 0000:3c:00.0

Is there any wrong in my configuration?

Here is the lscpu output:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
Stepping:              7
CPU MHz:               2200.000
BogoMIPS:              4400.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              16896K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt mba tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local ibpb ibrs stibp dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp arch_capabilities

The dpdk configuration:

dpdk:
  eal-params:
    proc-type: primary

  # DPDK capture support
  # RX queues (and TX queues in IPS mode) are assigned to cores in 1:1 ratio
  interfaces:
    - interface: 0000:3c:00.0 # PCIe address of the NIC port
      # Threading: possible values are either "auto" or number of threads
      # - auto takes all cores
      # in IPS mode it is required to specify the number of cores and the numbers on both interfaces must match
      threads: 24
      promisc: true # promiscuous mode - capture all packets
      multicast: true # enables also detection on multicast packets
      checksum-checks: true # if Suricata should validate checksums
      checksum-checks-offload: true # if possible offload checksum validation to the NIC (saves Suricata resources)
      mtu: 1500 # Set MTU of the device in bytes
      # rss-hash-functions: 0x0 # advanced configuration option, use only if you use untested NIC card and experience RSS warnings,
      # For `rss-hash-functions` use hexadecimal 0x01ab format to specify RSS hash function flags - DumpRssFlags can help (you can see output if you use -vvv option during Suri startup)
      # setting auto to rss_hf sets the default RSS hash functions (based on IP addresses)
      #rss-hash-functions: 0x6d5a

      # To approximately calculate required amount of space (in bytes) for interface's mempool: mempool-size * mtu
      # Make sure you have enough allocated hugepages.
      # The optimum size for the packet memory pool (in terms of memory usage) is power of two minus one: n = (2^q - 1)
      mempool-size: 262143 # The number of elements in the mbuf pool

      # Mempool cache size must be lower or equal to:
      #     - RTE_MEMPOOL_CACHE_MAX_SIZE (by default 512) and
      #     - "mempool-size / 1.5"
      # It is advised to choose cache_size to have "mempool-size modulo cache_size == 0".
      # If this is not the case, some elements will always stay in the pool and will never be used.
      # The cache can be disabled if the cache_size argument is set to 0, can be useful to avoid losing objects in cache
      # If the value is empty or set to "auto", Suricata will attempt to set cache size of the mempool to a value
      # that matches the previously mentioned recommendations
      mempool-cache-size: 511
      rx-descriptors: 8192
      tx-descriptors: 8192

And here is the threading configuration:

threading:
  set-cpu-affinity: yes
  # Tune cpu affinity of threads. Each family of threads can be bound
  # to specific CPUs.
  #
  # These 2 apply to the all runmodes:
  # management-cpu-set is used for flow timeout handling, counters
  # worker-cpu-set is used for 'worker' threads
  #
  # Additionally, for autofp these apply:
  # receive-cpu-set is used for capture threads
  # verdict-cpu-set is used for IPS verdict threads
  #
  cpu-affinity:
    - management-cpu-set:
        cpu: [ 12, 36 ]  # include only these CPUs in affinity settings
    - receive-cpu-set:
        cpu: [ 12 ]  # include only these CPUs in affinity settings
    - worker-cpu-set:
        cpu: [ "0-11", "24-35" ]
        mode: "exclusive"
        # Use explicitly 3 threads and don't compute number by using
        # detect-thread-ratio variable:
        # threads: 3
        prio:
          low: [  ]
          medium: [  ]
          high: [ "0-11", "24-35" ]
          default: "high"

Any help would be greatly appreciated!

Thanks.

What NIC and kernel are you running?
Maybe there are some limits that stop at 16. You can also try 32 for testing? Maybe just factor 2 is working. Would be worth a short test at least.

Hi @siderui

have you been able to resolve your issue?

I was looking at your configuration extracts but nothing really stands out…
However, I was trying out 16+ workers and everything seems to work, I’ve received traffic on all workers. I’ve based my test Suricata-7.0.0-rc1 with 18 workers and on Mellanox ConnectX6 NIC.

Here is my lscpu:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              40
On-line CPU(s) list: 0-39
Thread(s) per core:  2
Core(s) per socket:  10
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Stepping:            4
CPU MHz:             2200.000
CPU max MHz:         2200,0000
CPU min MHz:         800,0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            14080K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

dpdk conf:

dpdk:
  eal-params:
    proc-type: primary

  # DPDK capture support
  # RX queues (and TX queues in IPS mode) are assigned to cores in 1:1 ratio
  interfaces:
    - interface: 0000:3b:00.1 # PCIe address of the NIC port
      # Threading: possible values are either "auto" or number of threads
      # - auto takes all cores
      # in IPS mode it is required to specify the number of cores and the numbers on both interfaces must match
      threads: 18
      promisc: true # promiscuous mode - capture all packets
      multicast: true # enables also detection on multicast packets
      checksum-checks: true # if Suricata should validate checksums
      checksum-checks-offload: true # if possible offload checksum validation to the NIC (saves Suricata resources)
      mtu: 1500 # Set MTU of the device in bytes
      # rss-hash-functions: 0x0 # advanced configuration option, use only if you use untested NIC card and experience RSS warnings,
      # For `rss-hash-functions` use hexadecimal 0x01ab format to specify RSS hash function flags - DumpRssFlags can help (you can see output if you use -vvv option during Suri startup)
      # setting auto to rss_hf sets the default RSS hash functions (based on IP addresses)

      # To approximately calculate required amount of space (in bytes) for interface's mempool: mempool-size * mtu
      # Make sure you have enough allocated hugepages.
      # The optimum size for the packet memory pool (in terms of memory usage) is power of two minus one: n = (2^q - 1)
      mempool-size: 65535 # The number of elements in the mbuf pool

      # Mempool cache size must be lower or equal to:
      #     - RTE_MEMPOOL_CACHE_MAX_SIZE (by default 512) and
      #     - "mempool-size / 1.5"
      # It is advised to choose cache_size to have "mempool-size modulo cache_size == 0".
      # If this is not the case, some elements will always stay in the pool and will never be used.
      # The cache can be disabled if the cache_size argument is set to 0, can be useful to avoid losing objects in cache
      # If the value is empty or set to "auto", Suricata will attempt to set cache size of the mempool to a value
      # that matches the previously mentioned recommendations
      mempool-cache-size: 257
      rx-descriptors: 1024
      tx-descriptors: 1024
      #
      # IPS mode for Suricata works in 3 modes - none, tap, ips
      # - none: IDS mode only - disables IPS functionality (does not further forward packets)
      # - tap: forwards all packets and generates alerts (omits DROP action) This is not DPDK TAP
      # - ips: the same as tap mode but it also drops packets that are flagged by rules to be dropped
      copy-mode: none
      copy-iface: none # or PCIe address of the second interface

threading:

set-cpu-affinity: yes
  # Tune cpu affinity of threads. Each family of threads can be bound
  # to specific CPUs.
  #
  # These 2 apply to the all runmodes:
  # management-cpu-set is used for flow timeout handling, counters
  # worker-cpu-set is used for 'worker' threads
  #
  # Additionally, for autofp these apply:
  # receive-cpu-set is used for capture threads
  # verdict-cpu-set is used for IPS verdict threads
  #
  cpu-affinity:
    - management-cpu-set:
        cpu: [ 0 ]  # include only these CPUs in affinity settings
    - receive-cpu-set:
        cpu: [ 0 ]  # include only these CPUs in affinity settings
    - worker-cpu-set:
        cpu: [ 2,4,6,8,10,12,14,16,18,22,24,26,28,30,32,34,36,38 ]
        mode: "exclusive"

Pretty much defaults from the generated suricata.yaml file where I only changed NIC PCIe address, threads (within DPDK section), and worker-cpu-set (within threading section).

Tail of the output:

[166965] Perf: dpdk: Total RX stats of 0000:3b:00.1: packets 999957 bytes: 759990562 missed: 42 errors: 0 nombufs: 0
[166965] Perf: dpdk: (W#01-3b:00.1) received packets 44349
[166966] Perf: dpdk: (W#02-3b:00.1) received packets 67055
[166967] Perf: dpdk: (W#03-3b:00.1) received packets 72336
[166968] Perf: dpdk: (W#04-3b:00.1) received packets 52564
[166969] Perf: dpdk: (W#05-3b:00.1) received packets 84288
[166970] Perf: dpdk: (W#06-3b:00.1) received packets 38471
[166971] Perf: dpdk: (W#07-3b:00.1) received packets 31397
[166972] Perf: dpdk: (W#08-3b:00.1) received packets 44704
[166973] Perf: dpdk: (W#09-3b:00.1) received packets 73968
[166974] Perf: dpdk: (W#10-3b:00.1) received packets 70092
[166975] Perf: dpdk: (W#11-3b:00.1) received packets 37563
[166976] Perf: dpdk: (W#12-3b:00.1) received packets 50650
[166977] Perf: dpdk: (W#13-3b:00.1) received packets 54413
[166978] Perf: dpdk: (W#14-3b:00.1) received packets 54987
[166979] Perf: dpdk: (W#15-3b:00.1) received packets 96055
[166980] Perf: dpdk: (W#16-3b:00.1) received packets 55864
[166981] Perf: dpdk: (W#17-3b:00.1) received packets 40549
[166982] Perf: dpdk: (W#18-3b:00.1) received packets 30652
[166909] Info: counters: Alerts: 0

I’ve also tried to run Intel i40e card (X710 for 10GbE SFP+ 1572) and it works without any problem.

[225621] Notice: threads: Threads created -> W: 18 FM: 1 FR: 1   Engine started.
^C[225621] Notice: suricata: Signal Received.  Stopping engine.
[225621] Info: suricata: time elapsed 44.372s
[225696] Perf: flow-manager: 42296 flows processed
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_good_packets: 1000001
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_good_bytes: 760030842
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_unicast_packets: 963516
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_multicast_packets: 36485
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_unknown_protocol_packets: 1000001
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - mac_remote_errors: 1
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_size_64_packets: 173189
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_size_65_to_127_packets: 267803
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_size_128_to_255_packets: 35117
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_size_256_to_511_packets: 22604
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_size_512_to_1023_packets: 24173
[225677] Perf: dpdk: Port 0 (0000:19:00.0) - rx_size_1024_to_1522_packets: 477115
[225677] Perf: dpdk: Total RX stats of 0000:19:00.0: packets 1000001 bytes: 760030842 missed: 0 errors: 0 nombufs: 0
[225677] Perf: dpdk: (W#01-19:00.0) received packets 64021
[225678] Perf: dpdk: (W#02-19:00.0) received packets 52387
[225679] Perf: dpdk: (W#03-19:00.0) received packets 61623
[225680] Perf: dpdk: (W#04-19:00.0) received packets 43982
[225681] Perf: dpdk: (W#05-19:00.0) received packets 54930
[225682] Perf: dpdk: (W#06-19:00.0) received packets 53220
[225683] Perf: dpdk: (W#07-19:00.0) received packets 58025
[225684] Perf: dpdk: (W#08-19:00.0) received packets 77304
[225685] Perf: dpdk: (W#09-19:00.0) received packets 55814
[225686] Perf: dpdk: (W#10-19:00.0) received packets 55717
[225687] Perf: dpdk: (W#11-19:00.0) received packets 42358
[225688] Perf: dpdk: (W#12-19:00.0) received packets 64890
[225689] Perf: dpdk: (W#13-19:00.0) received packets 38128
[225690] Perf: dpdk: (W#14-19:00.0) received packets 56048
[225691] Perf: dpdk: (W#15-19:00.0) received packets 48316
[225692] Perf: dpdk: (W#16-19:00.0) received packets 71080
[225693] Perf: dpdk: (W#17-19:00.0) received packets 51190
[225694] Perf: dpdk: (W#18-19:00.0) received packets 50968
[225621] Info: counters: Alerts: 0
[225621] Perf: ippair: ippair memory usage: 414144 bytes, maximum: 16777216
[225621] Perf: host: host memory usage: 398144 bytes, maximum: 33554432
[225621] Notice: device: 0000:19:00.0: packets: 1000001, drops: 0 (0.00%), invalid chksum: 0
[225621] Info: dpdk: 0000:19:00.0: closing device

Thank you for your feedback.

To add, I would assume your work was based on the master from December.
In Suri 7.0.0-rc1 there is also output from DPDK xstats, which are extended port statistics. They might hint at where the problem lies, they are especially helpful when solving performance problems.
They are printed out at the end of Suricata run and are enabled with the highest verbosity settings (-vvvv).
The stats show up to 16 NIC queues - this is DPDK-based limitation that can be altered during DPDK compilation but not afterward. (The limitation is defined in the RTE_ETHDEV_QUEUE_STAT_CNTRS macro in the DPDK code). So they are primarily helpful for their counters like rx_missed_errors or rx_out_of_buffer.

Example:

[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_good_packets: 999957
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_good_bytes: 759990562
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_missed_errors: 42
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q0_packets: 44349
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q0_bytes: 29614870
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q1_packets: 67055
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q1_bytes: 58968263
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q2_packets: 72336
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q2_bytes: 60760101
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q3_packets: 52564
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q3_bytes: 42732403
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q4_packets: 84288
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q4_bytes: 39410608
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q5_packets: 38471
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q5_bytes: 25246260
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q6_packets: 31397
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q6_bytes: 15830004
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q7_packets: 44704
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q7_bytes: 34260795
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q8_packets: 73968
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q8_bytes: 68582879
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q9_packets: 70092
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q9_bytes: 56182609
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q10_packets: 37563
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q10_bytes: 23987412
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q11_packets: 50650
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q11_bytes: 39839318
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q12_packets: 54413
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q12_bytes: 43350681
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q13_packets: 54987
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q13_bytes: 43155566
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q14_packets: 96055
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q14_bytes: 91191000
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q15_packets: 55864
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_q15_bytes: 41615093
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_unicast_packets: 963516
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_unicast_bytes: 724508317
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_multicast_packets: 36483
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_multicast_bytes: 39522261
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_phy_packets: 999999
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_phy_bytes: 764030578
[166965] Perf: dpdk: Port 2 (0000:3b:00.1) - rx_out_of_buffer: 42