High Packet Drop Rate with DPDK compared to AF_PACKET in Suricata 7.0.7

I hope this message finds you well. I’m reaching out to seek assistance regarding a significant performance issue I’ve been experiencing with Suricata version 7.0.7 RELEASE running in IDS mode on my system. Specifically, I’m encountering a high packet drop rate (~45%) when operating in DPDK run mode, whereas the performance with AF_PACKET is notably better.


System Overview

  • Suricata Version: 7.0.7 installed from source
  • Operating Mode: IDS
  • Hardware Specifications:
    • CPU: 20 cores (cores 0-19)
  $ lscpu -e
  CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ     MHZ
    0    0      0    0 0:0:0:0           si 4900,0000 800,0000 800.000
    1    0      0    0 0:0:0:0           si 4900,0000 800,0000 800.000
    2    0      0    1 4:4:1:0           si 4900,0000 800,0000 800.000
    3    0      0    1 4:4:1:0           si 4900,0000 800,0000 800.000
    4    0      0    2 8:8:2:0           si 4900,0000 800,0000 800.000
    5    0      0    2 8:8:2:0           si 4900,0000 800,0000 800.000
    6    0      0    3 12:12:3:0         si 4900,0000 800,0000 800.000
    7    0      0    3 12:12:3:0         si 4900,0000 800,0000 800.000
    8    0      0    4 16:16:4:0         si 5000,0000 800,0000 800.000
    9    0      0    4 16:16:4:0         si 5000,0000 800,0000 800.574
   10    0      0    5 20:20:5:0         si 5000,0000 800,0000 800.000
   11    0      0    5 20:20:5:0         si 5000,0000 800,0000 800.000
   12    0      0    6 24:24:6:0         si 4900,0000 800,0000 848.286
   13    0      0    6 24:24:6:0         si 4900,0000 800,0000 800.000
   14    0      0    7 28:28:7:0         si 4900,0000 800,0000 800.000
   15    0      0    7 28:28:7:0         si 4900,0000 800,0000 800.000
   16    0      0    8 36:36:9:0         si 3800,0000 800,0000 800.000
   17    0      0    9 37:37:9:0         si 3800,0000 800,0000 800.000
   18    0      0   10 38:38:9:0         si 3800,0000 800,0000 800.000
   19    0      0   11 39:39:9:0         si 3800,0000 800,0000 800.001
  • Network Interface: Intel X540-T2 with vfio-pci driver.
  • PCI Address: 0000:05:00.0 (Cause of the vfio-pci, it has no IP address)
  • OS: ubuntu 22.04
  • Memory: 64 GB
    • HugePages: 4096 GB
  $  grep Huge /proc/meminfo 
AnonHugePages:         0 kB
ShmemHugePages:     8192 kB
FileHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4095
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         8388608 kB
  • Files:
    • suricata.yaml: Attached
    • suricata.log: Attached

Suricata Configuration Highlights

Below are the key configurations from my suricata.yaml that pertain to this issue:

dpdk:
  eal-params:
    proc-type: primary
    allow: ["0000:05:00.0"]
  interfaces:
    - interface: 0000:05:00.0
      threads: 8
      promisc: true
      multicast: false
      checksum-checks: false
      checksum-checks-offload: false
      mtu: 1500
      mempool-size: 262144
      mempool-cache-size: 512
      rx-descriptors: 4096 
      tx-descriptors: 4096 
      copy-mode: none
      copy-iface: none
      rss-hash-functions: auto

threading:
  set-cpu-affinity: yes
  cpu-affinity:
    - management-cpu-set:
        cpu: [16,17]
    - receive-cpu-set:
        cpu: [18]
    - verdict-cpu-set:
        cpu: [19]
    - worker-cpu-set:
        cpu: [0,2,4,6,8,10,12,14]
        mode: exclusive
        prio:
          default: high
  detect-thread-ratio: 1.0
  stack-size: 8mb

Additional Notable Configurations:

  • Mempool Configuration:
    • mempool-size: 262144
    • mempool-cache-size: 512
  • RX and TX Descriptors:
    • Both set to 4,096 per queue
  • Threading:
    • 8 worker threads assigned to cores 0,2,4,6,8,10,12,14
    • Management, receive, and verdict threads assigned to cores 16, 17, 18, and 19 respectively
  • App-Layer Protocols:
    • Multiple protocols enabled (HTTP, TLS, SSH, etc.) with specific detection ports
  • Runmode: Workers

Observed Performance Issues

When running Suricata in DPDK mode, the following metrics were observed from the logs:

  • Total Packets Received: ~17,559,078
  • Packets Dropped (rx_missed_errors): ~7,943,058
  • Packet Drop Percentage: Approximately 45.24%

Comparison with AF_PACKET Mode:

  • In AF_PACKET mode, the packet drop rate is significantly lower, and overall performance is more stable and efficient.
    • Total packets: 17,796,504
    • Drops: 3,787,138
    • Percentage: 21.28%

Requests for Assistance

Given the complexity of the issue and the critical nature of maintaining low packet drop rates for effective intrusion detection, I kindly request the community’s assistance with the following:

  1. Configuration Review:

    • Please review the attached suricata.yaml and suricata.log files to identify any misconfigurations or areas for optimization that I might have overlooked.
  2. Additional Optimization Tips:

    • Any other settings or optimizations that could help reduce the packet drop rate and enhance Suricata’s performance in DPDK mode.

Attached Files

For your reference and detailed analysis, I have attached the following files:

  1. suricata.yaml: Comprehensive configuration file outlining all current settings.
  2. suricata.log: Log output capturing the initialization, configuration, and performance metrics during a run in DPDK mode.

Best regards,
Álvaro

suricata.log (76.7 KB)
suricata.yaml (88.2 KB)

Some comments:

  1. You actually have 8 GB of hugepage memory allocated - you have allocated 4096 hugepages of size 2048 kB == 8 GB
  2. In your CPU setting it would be good to know what is your CPU and if you have Hyperthreading enabled. If you have HT enabled then it would be good to determine what are the Hyperthreaded cores. You can determine the CPU pairs by looking into /proc/cpuinfo and by pairing core id. So if you only use 8 cores use the independent cores only. Use of Hyperthreaded cores may boost the performance a little but 2 individual cores will always be better than 2 hyperthreaded ones.
  3. This CPU alignment applies to management CPUs as well (although to a lesser degree since the operation there is not so demanding)
  4. Receive/verdict CPU sets are not used in workers runmode
  5. I would set mempool size to 262143 and mp cache to 511 (there is some DPDK internal math that suggest these numbers but this very likely won’t cause the issue)
  6. With my experience with DPDK, it is better to set RX/TX descriptors to 32768. However, Intel cards (at least X710) support setting only 4096 descriptors. It seems like they are not a good fit for Suricata. I am not sure how it is with X540 but I assume it will be the same - try setting 32768 RX/TX descriptors, run Suricata in a very verbose mode (-vvvv) and observe if it says something about lowering down the descriptors.

Hi Lukas. Thanks for answering

  1. I set 8 GB of HugePages due to i saw in the suricata.log this sentence
[181614 - Suricata-Main] 2024-10-14 13:40:37 Perf: hugepages: Hugepages on NUMA node 0 can be set to 348 (only using 302/4095 2048kB hugepages)

So i reduce the size. However, i setup the Hugepages size to 20 GB, but the performance of the executions does not improve.
Due to your answer, i have set the Hugepages size to 20 GB again, and the log got in the suricata.log is the next:

Perf: hugepages: Hugepages on NUMA node 0 can be set to 347 (only using 301/20479 2048kB hugepages) [SystemHugepageEvaluateHugepages:util-hugepages.c:406]

2, 3. Yes. I have Hyperthreading enabled in the CPU. The caracteristics are:

  • Model: 12th Gen Intel(R) Core(TM) i7-12700K
  • Phyical Cores: 12
  • Total threads: 20
  • Hyperthreading: enabled
  • Hyperthreaded cores: (Core ID: Processor)
    • 0: Processors 0 & 1
    • 4: Processors 2 & 3
    • 8: Processors 4 & 5
    • 12: Processors 6 & 7
    • 16: Processors 8 & 9
    • 20: Processors 10 & 11
    • 24: Processors 12 & 13
    • 28: Processors 14 & 15
  • Independent cores (Non-Hyperthreaded): (Core ID: Processor)
    • 36: Processor 16
    • 37: Processor 17
    • 38: Processor 18
    • 39: Processor 19

Summary:

  • Total Physical Cores: 12
    • 8 Performance (P) Cores with Hyperthreading: Core IDs 0, 4, 8, 12, 16, 20, 24, 28
    • 4 Efficient (E) Cores without Hyperthreading: Core IDs 36, 37, 38, 39

To assign the cores in the cpu-affininty, i though that enabling the physical processor instead of hyperthreading would be better. But now i don’t know what to put in the cpu-affinity. Can you help me?

Trying to make a first approach, i though it would be better assign the P-cores to the workers cpu set, and the E-cores to the management, leaving this config:

threading:
  cpu-affinity:
  - management-cpu-set:
      cpu: [ 16-19 ]
  - worker-cpu-set:
      cpu: [ 0,2,4,6,8,10,12,14 ]
      mode: exclusive
      prio:
        default: high

I would be very grateful if you could tell me what is the best option.

  1. Okay. I have delete the both config from the cpu-affinity section
  2. Values set as you say, but it does not seem to be the root cause of the low performance
  3. When setting the RX/TX descriptors to 32768, it happens what you said. In the suricata.log i can see this:
[7722 - Suricata-Main] 2024-10-16 11:39:17 Warning: dpdk: 0000:05:00.0: device queue descriptors adjusted (RX: from 32768 to 4096, TX: from 32768 to 4096)

So the NIC is also downgrading the RX/TX descriptors to 4096. Maybe a solution to reach a better performance with DPDK mode would be to change to other NIC which allows the adjustment of RX/TX descriptors?

Hope you can help me.

Thank you very much!

Attached files

Hi @vendul0g,

thanks for the detailed report.

Hugepages
The message about hugepages is meant to say that you are overallocated hugepages and some can be freed. So in your case you don’t even need to allocate 8 GB but only 1 GB and the system can have more memory available for other stuff. Increasing the hugepage allocation will not help because Suricata simply won’t use them.

CPU affinity
Yes, your thinking is correct, assigning 0,2,4,6,8,10,12,14 seems like the best idea. For the management cores - maybe I would reduce the CPU list to 16-17, leaving 18-19 to the other system tasks.

NIC compatiblity
I have a good experience with Mellanox/NVIDIA NICs, here is a list of what I work with:

  • MCX516A-CCAT
  • MCX623106AE-CDAT
  • MCX75310AAS-NEAT

With these cards, you can set a higher number of RX/TX descriptors which should solve the issue.

But i don’t understand how can be possible that af-packet mode could perform better than DPDK in the same hardware.

Have you ever seen something like that?

This is my AF-PACKET config setup:

af-packet:
  - interface: enp5s0f0
    threads: 20
    cluster-id: 99
    cluster-type: cluster_qm
    defrag: yes
    use-mmap: yes
    mmap-locked: yes
    tpacket-v3: yes
    use-emergency-flush: yes
    disable-promisc: no
    checksum-checks: no

In AF-PACKET mode, i use ixgbe driver. For DPDK setup, i change the driver to vfio-pci. I don’t know if this could be the problem of that different.

To my knowledge, the kernel (af-packet) is, independently from Suricata, receiving packets into its structures, from the HW queues to the software queues and those are bigger than the HW queues.

DPDK Suricata receives packets directly from those HW queues and those are read from only in batches and where Suricata needs to process the batch and only then get another batch. These queues might get full on some workload bursts and because kernel receives the data continuously independently from the Suricata processing, the workload spikes are handled better.

I stumbled upon this problem here, I forgot to reply to André though.

@vendul0g if you compile DPDK yourself there is one suggestion in the thread - increase the count of RX/TX descriptors by editing the respective files in DPDK (and then compile and install it) as André and myself suggested.

DPDK compilation is pretty easy, once cloned you need only something like:

# uninstall the previously installed DPDK
meson --prefix=/usr/ build # to be installed global wise
ninja -C build
sudo ninja -C build install
1 Like

Hi Lukas. It works!

As you suggested, I rebuilt DPDK with the modified IXGBE_MAX_RING_DESC values, and it’s working.

I’ll explain the process here in case anyone else needs guidance. In my case, I have an X540-AT2 NIC, which limits the RX and TX descriptors to 4096. To identify this issue, I ran Suricata in verbose mode: suricata --dpdk -vvvv, and noticed the following entries in the logs: [7722 - Suricata-Main] 2024-10-16 11:39:17 Warning: dpdk: 0000:05:00.0: device queue descriptors adjusted (RX: from 32768 to 4096, TX: from 32768 to 4096).

Because of this restriction, DPDK’s performance was very slow, to the point where even AF_PACKET mode outperformed DPDK.

To resolve this, you first need to identify which driver your network card (NIC) uses—in my case, it’s ixgbe. Next, navigate to the directory where you have DPDK downloaded (or download it if you haven’t already). Then modify one file, depending on your NIC’s driver:

  • dpdk/drivers/net/ixgbe/ixgbe_rxtx.h#define IXGBE_MAX_RING_DESC 32768

Finally, reinstall DPDK:

# uninstall the previously installed DPDK
meson --prefix=/usr/ build # for a system-wide installation
ninja -C build
sudo ninja -C build install

Thank you so much, Lukas. You’ve been an enormous help!

2 Likes

Hi vendul0g, great to hear that, I’m glad we have pushed this together.

This could be actually a good reason to send a patch to DPDK to increase the RX/TX descriptors as they capped them unnecessarily low - I will try to follow-up on this.

1 Like