Suricata 6.x not decoding MPLS packets efficiently

Manuals say modern Suricata supports decoding GRE/MPLS/etc by default. My config file uses all the CPU cores with normal traffic. However, during analyzing MPLS traffic only 1 core is 100% busy and all other are only 0-1% . stats.log rows with mpls show 0 everywhere, no drops also. I tried playing with set-cpu-affinity setting, but nothing has changed.

ifconfig shows that incoming traffic is huge, so there is definitely something to analyze. Suricata was installed from recommended repo.

Some alerts are being generated, yes, not like they are missing at all. However, in most cases reported payload looks like 235.a..b679.abd..EH.. .

So it seems, that MPLS decoding is not working as expected and is not efficient, as only 1 core is loaded, but for 100%. Please explain what do I need to configure to start decoding/analyzing MPLS packets properly and efficiently?

What version are you running exactly?
Please share your suricata.yaml as well and also the NIC used and how you run Suricata.
1 core being busy and the others aren’t could be related to the fact, that the NIC together with the kernel is not properly distributing the flows.

Version: This is Suricata version 6.0.8 RELEASE
Systemd starts Suricata: /usr/bin/suricata -D -c /etc/suricata/suricata.yaml --af-packet=eth1

Config YAML file is attached, maybe you find something inefficient in that:

suricata.yaml (70.6 KB)

What NIC is used and do you optimize something there as well?
What kernel is used?

Some things you could try are:

Can you generate a pcap with that MPLS traffic as well? I have seen issues with non rfc compliant MPLS in the past or multiple layers.

Currently I guess that the issue is with the flow distribution to the different cores.

Any command to get NIC info you need?
Linux kernel is 5.4.0-128-generic, it runs inside of XEN VM , no specific optimizations by me.

Unfortunately I can’t share traffic sample.

runmode: autofp successfully makes all 8 cores to be active. Should I leave it or it’s just for testing in my case?

You can run ethtool -i eth1 but being XEN VM is a good point. Maybe there is something done by the NIC in virtualized mode that is the issue.
Or something with the interaction of the kernel in AF_PACKET mode in the XEN VM.

You can keep autofp if performance is enough for you, in general autofp is much slower compared to workers. So if you run into drops, you might have to rethink.
Also make sure all other stats are okay and validate if the events look like you expect them to be.
At least it proves that it’s related to the capture method.

I seems runmode: autofp will not work. Even with 24 cores now drop rate is huge:

capture.kernel_packets          | Total       | 38754672
capture.kernel_drops            | Total       | 34060040

NIC info in VM:

driver: vif
version: 
firmware-version: 
expansion-rom-version: 
bus-info: vif-1
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

Related NIC info outside of VM:

driver: bnxt_en
version: 5.15.0-52-generic
firmware-version: 222.0.138.0/pkg 22.21.06.80
expansion-rom-version: 
bus-info: 0000:4b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Any suggestions?

I have no experience with those specific virtual NIC drivers.
Maybe it’s missing some relevant functionality.
Could you also show the output of ethtool -l and ethtool -x and when you switch back to workers also print out ethtool -S eth1 so we can see how the flow distribution is done at the NIC queue level.
Another helpful output would be sudo perf top -p ($pidof suricata).

ethtool output on host (as in VM I get an error Operation not supported:

# ethtool -l eth1

Channel parameters for eth1:
Pre-set maximums:
RX:		37
TX:		37
Other:		n/a
Combined:	74
Current hardware settings:
RX:		0
TX:		0
Other:		n/a
Combined:	8
# ethtool -x eth1

RX flow hash indirection table for eth1 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
    8:      0     1     2     3     4     5     6     7
   16:      0     1     2     3     4     5     6     7
   24:      0     1     2     3     4     5     6     7
   32:      0     1     2     3     4     5     6     7
   40:      0     1     2     3     4     5     6     7
   48:      0     1     2     3     4     5     6     7
   56:      0     1     2     3     4     5     6     7
   64:      0     1     2     3     4     5     6     7
   72:      0     1     2     3     4     5     6     7
   80:      0     1     2     3     4     5     6     7
   88:      0     1     2     3     4     5     6     7
   96:      0     1     2     3     4     5     6     7
  104:      0     1     2     3     4     5     6     7
  112:      0     1     2     3     4     5     6     7
  120:      0     1     2     3     4     5     6     7

RSS hash function:
    toeplitz: on
    xor: off
    crc32: off

When I switch back to workers and run ethtool -S eth1:

NIC statistics:
     rx_gso_checksum_fixup: 0

And perf output:

Samples: 39K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 7788756963 lost: 0/0 drop: 0/0
Overhead  Shared Object       Symbol
  50.72%  suricata            [.] SigMatchSignaturesGetSgh
  42.46%  suricata            [.] DetectAddressMatchIPv4
   2.91%  libhs.so.5.2.1      [.] avx512_hs_reset_and_expand_stream
   1.97%  suricata            [.] DetectProtoContainsProto
   0.27%  suricata            [.] DetectPortLookupGroup
   0.26%  libpthread-2.31.so  [.] pthread_mutex_unlock
   0.26%  libpthread-2.31.so  [.] pthread_mutex_trylock
   0.21%  suricata            [.] SCHSSearch
   0.10%  suricata            [.] SCGetProtoByName
   0.07%  suricata            [.] StreamTcp
   0.05%  suricata            [.] FlowGetFlowFromHash
   0.05%  libpthread-2.31.so  [.] __pthread_mutex_lock
   0.05%  suricata            [.] SCGetContext
   0.05%  suricata            [.] FlowVarPrint
   0.04%  suricata            [.] DefragTimeoutHash
   0.04%  libhs.so.5.2.1      [.] avx512_hs_scan
   0.03%  suricata            [.] DetectPortHashFree
   0.03%  suricata            [.] TmThreadsSlotVarRun
   0.03%  suricata            [.] Prefilter
   0.02%  suricata            [.] FlowQueuePrivateGetFromTop
   0.02%  suricata            [.] DetectSetFastPatternAndItsId
   0.02%  suricata            [.] OutputLoggerLog
   0.01%  suricata            [.] FlowSetupPacket
   0.01%  libc-2.31.so        [.] qsort
   0.01%  suricata            [.] PacketPoolReturnPacket
   0.01%  suricata            [.] FlowGetFromFlowKey
   0.01%  suricata            [.] AppLayerParserTransactionsCleanup
   0.01%  libc-2.31.so        [.] pthread_attr_setschedparam
   0.01%  suricata            [.] Detect
   0.01%  libc-2.31.so        [.] __nss_database_lookup
   0.01%  suricata            [.] AppLayerProtoDetectGetProto
   0.01%  suricata            [.] TagHandlePacket
   0.01%  suricata            [.] 0x00000000000d6970
   0.01%  libc-2.31.so        [.] free
   0.01%  suricata            [.] FlowHandlePacketUpdate
   0.01%  suricata            [.] StreamTcpPacket
   0.01%  suricata            [.] PacketFreeOrRelease
   0.01%  suricata            [.] DecodeIPV4
   0.01%  suricata            [.] DecodeSll
   0.01%  suricata            [.] StatsIncr
   0.01%  libc-2.31.so        [.] malloc
   0.01%  suricata            [.] FlowUpdateState
...

Hmm the ethtool -S eth1 is not working within the VM? That’s bad, cause you could have seen something like this:

     rx-0.packets: 26084335494
     rx-0.bytes: 22498141868507
     tx-1.packets: 0
     tx-1.bytes: 0
     rx-1.packets: 26084355414
     rx-1.bytes: 22497102569859
     tx-2.packets: 0
     tx-2.bytes: 0
     rx-2.packets: 26079500782
     rx-2.bytes: 22493143048915
     tx-3.packets: 0
     tx-3.bytes: 0
     rx-3.packets: 26084827753
     rx-3.bytes: 22498095119352
     tx-4.packets: 0
     tx-4.bytes: 0
     rx-4.packets: 26078679894
     rx-4.bytes: 22490236645024

So in this example you can see that all 4 queues are more or less equally busy with packets.
And this would have helped with narrowing it down.

The perf output is intersting, since the overhead for DetectAddressMatchIPv4 is quite high. You could also try to play around with different cluster_ modes but I doubt it will change much.

It would be helpful if you can somehow recreate it in a test environment where you can capture a pcap as well and share it. We would have to setup up such a setup to reproduce it or maybe someone else from the community has experience with that setup.
Maybe something is wrong in the MPLS traffic or maybe using something special which is valid but not well supported.
Another option would be to play around with other tools that use AF_PACKET and could show the flow distribution.

exactly ethtool -S eth1 is working in VM, but the only output is:

NIC statistics:
     rx_gso_checksum_fixup: 0

But in other place (without MPLS) where Suricata is running output is the same.

I would still think that this is an issue with the cooperation of the NIC (driver) + Kernel + Suricata. I’ve seen multiple MPLS setups working, so if you are able to create a test pcap we might be able to spot a crucial diff.