Suricata IDS in worker mode with af-packet cluster_qm on a router—"pkt seen on wrong thread" error

Hi all,

I’m trying to integrate Suricata into an Linux open-source router distribution. I have a pretty basic IDS setup using autofp mode working just fine, but have been experimenting with ways to optimize the performance by deploying “worker” mode with proper RSS on the NIC.

Despite following the guide here and configuring symmetric RSS, etc. I am still getting a few “pkt seen on wrong thread” alerts so I was looking for a sanity check of my config and my understanding of the kernel mechanisms at play.

The basic setup is as follows:

  • The system already runs a router and stateful firewall/NAT. There is a WAN interface and a LAN interface and standard Linux packet forwarding in play.
  • Suricata 6.0.10 (yes I know we need to upgrade)
  • Suricata running in IDS mode only, no IPS at this time.
  • To get packets into Suricata, I’m using af-packet mode configured to listen on the LAN interface.
  • Test system is a 12-core Intel C3858, using built-in Intel X553 10GbE NIC configured with 12 receive queues.
  • Tuned the NIC so it should provide symmetric RSS hashing using ethtool -X eth1 hkey 6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A equal 12 (and followed all other steps in the guide linked above)
  • Debian Bookworm with kernel 6.6.45

YAML pasted at the bottom of this post.

What I’m seeing: alert “SURICATA STREAM pkt seen on wrong thread” occurring in the eve log once or twice every minute. Stats show 6,621,026 packets processed since start, so this is not the highest volume system.

The majority of the errors I’ve noticed are traffic flows to/from services on the router itself, e.g. my laptop connecting to the SSH admin port of the router. However, a few other flows seem to trigger the error, for example:

{"timestamp":"2024-08-17T23:47:17.177241+0000","flow_id":770646896582810,"in_iface":"eth6","event_type":"alert","src_ip":"2600:1406:2e00:0019:0000:0000:6010:3724","src_port":443,"dest_ip":"2601:xxxx:xxxx:2651:9824:9acb:0f89:aec1","dest_port":49329,"proto":"TCP","alert":{"action":"allowed","gid":1,"signature_id":2210059,"rev":1,"signature":"SURICATA STREAM pkt seen on wrong thread","category":"","severity":3},"tls":{"sni":"iphone-ld.apple.com","version":"UNDETERMINED","ja3":{"hash":"773906b0efdefa24a7f2b8eb6985bf37","string":"771,4865-4866-4867-49196-49195-52393-49200-49199-52392-49162-49161-49172-49171-157-156-53-47-49160-49170-10,0-23-65281-10-11-16-5-13-18-51-45-43-27-21,29-23-24-25,0"},"ja3s":{}},"app_proto":"tls","flow":{"pkts_toserver":3,"pkts_toclient":2,"bytes_toserver":787,"bytes_toclient":180,"start":"2024-08-17T23:47:16.893082+0000"}}
{"timestamp":"2024-08-17T23:47:16.941185+0000","flow_id":770646896582810,"in_iface":"eth6","event_type":"tls","src_ip":"2601:xxxx:xxxx:2651:9824:9acb:0f89:aec1","src_port":49329,"dest_ip":"2600:1406:2e00:0019:0000:0000:6010:3724","dest_port":443,"proto":"TCP","tls":{"sni":"iphone-ld.apple.com","version":"TLS 1.3","ja3":{"hash":"773906b0efdefa24a7f2b8eb6985bf37","string":"771,4865-4866-4867-49196-49195-52393-49200-49199-52392-49162-49161-49172-49171-157-156-53-47-49160-49170-10,0-23-65281-10-11-16-5-13-18-51-45-43-27-21,29-23-24-25,0"},"ja3s":{"hash":"15af977ce25de452b96affa2addb1036","string":"771,4866,43-51"}}}

(2601:xxxx:xxxx:2651:9824:9acb:0f89:aec1 is the address of an iOS client on the LAN side)

A few questions/assumptions I wanted to check:

  1. Since this is a router, packets are both being transmitted on the interface by the system (IP forwarding or local services on the router itself) and received on the interface (from clients on the LAN). It makes sense how symmetric RSS on the NIC and one worker bound to each queue should allocate received packets to the correct worker every time, but what control which worker sees transmitted packets?
  2. Am I configuring my X550 NIC correctly for the symmetric RSS function? It seems that without configuring ntuple on and setting the appropriate hkey, I receive a lot more “pkt seen on wrong stream” errors.
  3. Is the af-packet approach the right direction for high performance IDS on a router appliance? For a lot of reasons af-packet might be hard to use if we want to expand to supporting IPS in the future, so I would be open to arguments that my time would be better spent exploring an nftables implementation.

Thanks,
Lucas

suricata.yaml (45.1 KB)

Hi Lucas,

  1. I think on the af-packet side it can be implemented in software using RPS - Receive Packet Steering (software substitute for RSS).
  2. Since you have close to zero “pkt seen on wrong stream” alerts then I believe you configured that correctly. It is ok to see a few packets here and there on the wrong thread but I am not sure what might be the cause.
  3. Depends on the technologies used, personally, if you need to combine multiple applications together and there is no dead-end in terms of integration then I would choose AF-Packet. If your current stack works better with nfqueue then I would choose that. If you don’t have high-throughput deployments (less than 10 Gbps) then I would choose the easiest way you can take.
    If I recall correctly, AF-Packet can perform better and is generally more used.

Thanks for the reply and good to know we’re generally on the right track. I certainly know of some users who are running 10 Gbps NICs with sustained traffic, so I’m definitely interested in seeing how much performance can be squeezed out.

I did do some testing around RPS and confirmed it does perform similarly to RSS if we use cluster_cpu instead of cluster_qm. The number of “pkt seen on wrong thread” is relatively consistent between both, however. I think these are still concerning as they seem to be reliably on inbound flows (WAN to LAN), which are small for my test environment but may be larger in other environments.

When Linux forwards a packet, I assume the forwarded packet comes in on some RX queue on the WAN NIC, then gets processed on some CPU based on the interrupt config, RPS/RFS, etc. Where I get fuzzy is whether the same CPU that processes the inbound packet is also guaranteed to be the one that also transmits the forwarded packet on the LAN NIC (I’m not aware of any kernel documentation on this). There is also the XPS (Transmit Packet Steering) mechanism which might come in to play here?