I have a cluster_flow deployment of Suricata which sees this counter incrementing regularly, and it appears to be following a pattern proportional to kernel_drops, which I’ve been trying to remove (the drops are under what the box should be able to handle)
Because of this, I stumbled on this:
And opted to try and configure cluster_qm w/ rss on a different/test system using X710’s, to see how it behaves. Prior to doing this, the system was configured with cluster_flow (fairly similarly to the production box which was dropping), and wasn’t really seeing any pkt_on_wrong_thread counters.
However, after following these steps: https://suricata.readthedocs.io/en/suricata-5.0.3/performance/high-performance-config.html
The box now regularly sees pkt_on_wrong_thread incrementing… I feel like I must’ve done something wrong, but I believe I follow the steps correctly.
I thought, with cluster_qm, the balancing was done by the NIC, and so each worker thread would read from its own rss queue, fed by the NIC balancing via hw using a hash of 5-tuple… is this correct?
If so; how could a pkt be balanced to the wrong thread!?
On the first link above there’s concern over tunneling and how they’re balanced, but my traffic mix is entirely IPV4 w/ TCP, and UDP. No tunnels.
Do you mind sharing details of the NIC config (for cluster_qm)/"ethtool -x " output /Suri version /Kernel level ?
In my test cases - the results were reversed, so it will be very interesting to see why/what’s the diff.
Hmm… this probably also explains the correlation to drops.
What is “the rollover option”? I’d like to understand further.
Is this still possible when using RSS and cluster_qm? (In that mode, I thought the balancing was done entirely by the NIC? Will suricata intentionally rebalance under certain scenarios?)
Short version: rollover option does not exist, forget about it.
Long version: The rollover option was added to af_packet socket to allow kernel to drop packet because a socket queue is full. If the option is activated then kernel send packet to the next socket in the set if ever the one it should have been sent as a full queue. This prevent dropping BUT it means the packet will not reached the socket it was supposed to. On Suricata side, this means that it will reach the wrong thread and cause a mess.
Getting back to this; it sounds like you’re recommending disabling the rollover option.
I’d like to understand how this affects Suricata though. Can you elaborate on how it “causes a mess” in Suricata?
Is the processing of these packets heavier?
I’ve noticed that once we start to see Suricata drops, they often ramp up very heavily, so I’m wondering if this might play a role (i.e., if the handling of packets on the wrong thread is heavier, potentially it creates further backpressure on the queues, and makes it even more likely that further drops will occur?)
In looking into this, I found that the value rolls over fairy often. Note sure if this is expected:
tcp.pkt_on_wrong_thread | Total | 3250877
tcp.pkt_on_wrong_thread | Total | 4314104
tcp.pkt_on_wrong_thread | Total | 5015413
tcp.pkt_on_wrong_thread | Total | 5528283
tcp.pkt_on_wrong_thread | Total | 6035067
tcp.pkt_on_wrong_thread | Total | 6479016
tcp.pkt_on_wrong_thread | Total | 8135119
tcp.pkt_on_wrong_thread | Total | 9836991
tcp.pkt_on_wrong_thread | Total | 10709514
tcp.pkt_on_wrong_thread | Total | 11263319
tcp.pkt_on_wrong_thread | Total | 11748549
tcp.pkt_on_wrong_thread | Total | 12273358
tcp.pkt_on_wrong_thread | Total | 12465899
tcp.pkt_on_wrong_thread | Total | 92700
In any event, it seems like I’m seeing about 2.4% of packets listed as on the wrong thread, so not especially high, at least, I’m just confused by it, as the balancing is offloaded to the NIC… so I think that implies that suricata and the NIC may have a different notion of what a flow is in certain circumstances.
Sorry to revive this ancient thread, but I ran into the same issue for a completely different reason. Hopefully this helps someone searching for this problem in the future:
This can also be caused by duplicate configurations for the same interface in suricata.yaml – for example, if one changes the default interface to the primary interface.