Dpdk packet loss

Currently I use dpdk for packet capturing. But I’m experiencing very high packet loss and I don’t know why.
This is the situation:
I use tcpreplay to replay a pcap file with 50 million packets at top speed with following command:

sudo tcpreplay -i eno4 -t test.pcap

on another machine I use suricata+dpdk to process all the packet. The two servers are connected with fiber.

my server looks like:

Suricata version: 7.0.2-release (compiled from source with dpdk support)
Linux version: Ubuntu 22.04 jammy
Kernel version: x86_64 Linux 5.15.0-91-generic
RAM: 128 GB
CPU: Intel Xeon E5-2630 v4 @ 40x 3.1GHz
NIC:
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
04:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
DPDK: 21.11.4-0ubuntu0.22.04.1 amd64 (install from apt, precompiled)
NIC Driver: vfio without iommu
system load without running suricata: 0.11 0.29 1.80

lscpu output looks like:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
Stepping: 1
CPU max MHz: 3100.0000
CPU min MHz: 1200.0000
BogoMIPS: 4400.02
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts re p_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx sm x est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadli ne_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 in vpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase t sc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveop t cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1 d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 640 KiB (20 instances)
L1i: 640 KiB (20 instances)
L2: 5 MiB (20 instances)
L3: 50 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBR S Not affected
Srbds: Not affected
Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable

my suricata.yml:
suricata.yaml (83.3 KB)

Suricata console output:

EAL: No available 2048 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
i: conf: unable to find interface default in DPDK config
i: log-pcap: Ring buffer initialized with 3 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 3 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 4 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 3 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 3 files.
i: log-pcap: Ring buffer initialized with 31 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: log-pcap: Ring buffer initialized with 2 files.
i: threads: Threads created → W: 19 FM: 1 FR: 1 Engine started.
^Ci: suricata: Signal Received. Stopping engine.
i: device: 0000:04:00.1: packets: 52680614, drops: 11729350 (22.27%), invalid chksum: 106

stats.log:

capture.packets | Total | 52680614
capture.rx_errors | Total | 11729350
capture.dpdk.imissed | Total | 11729350
decoder.pkts | Total | 40951264
decoder.bytes | Total | 5645069841
decoder.invalid | Total | 121
decoder.ipv4 | Total | 40563972
decoder.ipv6 | Total | 323342
decoder.ethernet | Total | 40951591

I’ve tried many differet settings but I still experienced 20% packet loss, can anyone help me?

Hi Sig,

can you share how you run Suricata?
Are you losing packets with other capture modes (e.g. AF-PACKET) too?
What’s the speed of replay?
Do you loose packets if you don’t capture the traffic?

Considering your config, I would increase the number of RX descriptors so the workers have a bigger buffer for received packets.

Hi,

Yes, I’ve tried running suricata in three case: dpdk on fiber, af-packet on fiber, af-packet on wire. All of them are losing packet in 600 mbps replay speed. If I don’t capture the flow replayed by tcpreplay, It still got packet loss.

By the way I’ve already increased the RX descriptors to 4096 and removed all rules and disabled most part of suricata logging functions to low the burden of suricata. But I still got very subtle packet loss (260k packet, 50 million packet in total, 0.41%)

I think in this situation both dpdk and suricata should handle all the traffic without any trouble but rx_errors looks like a ghost.

Thank you for help!!!

With 600 Mbps you could try to to lower the number of workers, even to e.g. 4/8 workers (while disabling all the output/detection/PCAP capture capabilities as you mentioned just to see if you are not hitting some synchronization issues.

I’ve also noticed your manager threads have affinity set to 0,1 where core 1 is from the other NUMA node.

Yeah, currently I found the packet loss is affected by functions I enabled in suricata configures.

So I should set cpu 0 and 2 as manager workers?

So I should set cpu 0 and 2 as manager workers?

yes but don’t forget to remove 2 from the worker list.
Or you can try to use just one core for the management threads, at 600 Mbps it should be ok. → you would observe CPU utilization, it is hitting 90+% then add one more core

Sorry for replying so late because I was on holiday.

Sadly, this method doesn’t work. Actually I cannot set workers number to 4 or 6, the dpdk reported error:
[3517535] Error: dpdk: Interfaces requested more cores than configured in the threading section (requested 16 configured 6 [3517535] Error: dpdk: DPDK configuration could not be parsed
Then I changed my settings to below:

  cpu-affinity:
    - management-cpu-set:
        cpu: [ 0, 2 ]  # include only these CPUs in affinity settings
    - receive-cpu-set:
        cpu: [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31]  # include only these CPUs in affinity settings
    - worker-cpu-set:
        cpu: [ 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 ]
        mode: "exclusive"
        # Use explicitly 3 threads and don't compute number by using
        # detect-thread-ratio variable:
        # threads: 3
        prio:
          low: [ 0 ]
          medium: [ "1-2" ]
          high: [ 3 ]
          default: "medium"

With this configuration suricata still got 3.37% packet loss, which means it drops 1.77 million packets.
If I disable all suricata rules. The packet loss rate gets down to the 0.47% (246k packets drops)

I think I found the problem, this cpu is tooooooo old for this task. I just ran the sysbench on this server and another server.
this server(system load 5.0):

CPU speed:
    events per second:   816.79

General statistics:
    total time:                          10.0002s
    total number of events:              8170

Latency (ms):
         min:                                    1.22
         avg:                                    1.22
         max:                                    1.35
         95th percentile:                        1.23
         sum:                                 9995.55

Threads fairness:
    events (avg/stddev):           8170.0000/0.00
    execution time (avg/stddev):   9.9956/0.00

another(system load 44.0, running suricata 6+pfring on two X710 nic, 8GBps flow, 40% packet loss):

CPU speed:
    events per second:  2252.23

General statistics:
    total time:                          10.0004s
    total number of events:              22528

Latency (ms):
         min:                                    0.40
         avg:                                    0.44
         max:                                    1.47
         95th percentile:                        0.51
         sum:                                 9989.98

Threads fairness:
    events (avg/stddev):           22528.0000/0.00
    execution time (avg/stddev):   9.9900/0.00

So do you think this packet loss is acceptable? I mean do you think that most packet loss wasn’t caused by my configuration file but this slow cpu?

So on the better server, you still have 40% packet loss?

So do you think this packet loss is acceptable? I mean do you think that most packet loss wasn’t caused by my configuration file but this slow CPU?

I guess it is hard to tell, we don’t know the configuration of the other server. But generally, your config looks ok, I don’t see any major issues.