Tcp.pkt_on_wrong_thread

I’m confused by what this counter is saying.

I have a cluster_flow deployment of Suricata which sees this counter incrementing regularly, and it appears to be following a pattern proportional to kernel_drops, which I’ve been trying to remove (the drops are under what the box should be able to handle)

Because of this, I stumbled on this:

And opted to try and configure cluster_qm w/ rss on a different/test system using X710’s, to see how it behaves.
Prior to doing this, the system was configured with cluster_flow (fairly similarly to the production box which was dropping), and wasn’t really seeing any pkt_on_wrong_thread counters.
However, after following these steps:


The box now regularly sees pkt_on_wrong_thread incrementing… I feel like I must’ve done something wrong, but I believe I follow the steps correctly.

I thought, with cluster_qm, the balancing was done by the NIC, and so each worker thread would read from its own rss queue, fed by the NIC balancing via hw using a hash of 5-tuple… is this correct?

If so; how could a pkt be balanced to the wrong thread!?

On the first link above there’s concern over tunneling and how they’re balanced, but my traffic mix is entirely IPV4 w/ TCP, and UDP. No tunnels.

Thanks,
Jeff

Do you mind sharing details of the NIC config (for cluster_qm)/"ethtool -x " output /Suri version /Kernel level ?
In my test cases - the results were reversed, so it will be very interesting to see why/what’s the diff.

Indeed, seems opposite, so I’m confused. I wonder if I’m somehow seeing the inverse side of the balancing issue you were seeing?

Using Suricata 4.1.4 (rev b68e2e7) (will try with 6.0 soon)
Kernel: 4.15.0-74-generic

from suricata.yaml:
af-packet:
- block-size: 1048576
bypass: ‘yes’
checksum-checks: ‘no’
cluster-id: 99
cluster-type: cluster_flow
defrag: ‘yes’
interface: lan0
ring-size: 1500000
rollover: ‘yes’
threads: 12
tpacket-v3: ‘yes’
use-mmap: ‘yes’
xdp-mode: driver

$ ethtool -x lan0
RX flow hash indirection table for lan0 with 12 RX ring(s):
    0:      0     1     2     3     4     5     6     7
    8:      8     9    10    11     0     1     2     3
   16:      4     5     6     7     8     9    10    11
   24:      0     1     2     3     4     5     6     7
   32:      8     9    10    11     0     1     2     3
   40:      4     5     6     7     8     9    10    11
   48:      0     1     2     3     4     5     6     7
   56:      8     9    10    11     0     1     2     3
   64:      4     5     6     7     8     9    10    11
   72:      0     1     2     3     4     5     6     7
   80:      8     9    10    11     0     1     2     3
   88:      4     5     6     7     8     9    10    11
   96:      0     1     2     3     4     5     6     7
  104:      8     9    10    11     0     1     2     3
  112:      4     5     6     7     8     9    10    11
  120:      0     1     2     3     4     5     6     7
  128:      8     9    10    11     0     1     2     3
  136:      4     5     6     7     8     9    10    11
  144:      0     1     2     3     4     5     6     7
  152:      8     9    10    11     0     1     2     3
  160:      4     5     6     7     8     9    10    11
  168:      0     1     2     3     4     5     6     7
  176:      8     9    10    11     0     1     2     3
  184:      4     5     6     7     8     9    10    11
  192:      0     1     2     3     4     5     6     7
  200:      8     9    10    11     0     1     2     3
  208:      4     5     6     7     8     9    10    11
  216:      0     1     2     3     4     5     6     7
  224:      8     9    10    11     0     1     2     3
  232:      4     5     6     7     8     9    10    11
  240:      0     1     2     3     4     5     6     7
  248:      8     9    10    11     0     1     2     3
  256:      4     5     6     7     8     9    10    11
  264:      0     1     2     3     4     5     6     7
  272:      8     9    10    11     0     1     2     3
  280:      4     5     6     7     8     9    10    11
  288:      0     1     2     3     4     5     6     7
  296:      8     9    10    11     0     1     2     3
  304:      4     5     6     7     8     9    10    11
  312:      0     1     2     3     4     5     6     7
  320:      8     9    10    11     0     1     2     3
  328:      4     5     6     7     8     9    10    11
  336:      0     1     2     3     4     5     6     7
  344:      8     9    10    11     0     1     2     3
  352:      4     5     6     7     8     9    10    11
  360:      0     1     2     3     4     5     6     7
  368:      8     9    10    11     0     1     2     3
  376:      4     5     6     7     8     9    10    11
  384:      0     1     2     3     4     5     6     7
  392:      8     9    10    11     0     1     2     3
  400:      4     5     6     7     8     9    10    11
  408:      0     1     2     3     4     5     6     7
  416:      8     9    10    11     0     1     2     3
  424:      4     5     6     7     8     9    10    11
  432:      0     1     2     3     4     5     6     7
  440:      8     9    10    11     0     1     2     3
  448:      4     5     6     7     8     9    10    11
  456:      0     1     2     3     4     5     6     7
  464:      8     9    10    11     0     1     2     3
  472:      4     5     6     7     8     9    10    11
  480:      0     1     2     3     4     5     6     7
  488:      8     9    10    11     0     1     2     3
  496:      4     5     6     7     8     9    10    11
  504:      0     1     2     3     4     5     6     7
RSS hash key:
6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a
RSS hash function:
    toeplitz: on
    xor: off
    crc32: off

Can you try lowering the ring size to 200000? And test with 6.0 please if possible.

I still see tcp.pkt_on_wrong_thread when using Suricata 6.0.

After reducing the ring-size, for my test traffic, I saw the counter increment more (~ 1.5 - 1.7x more)

The rollover option is possibly contributing to this issue. It breaks the flow based load balancing when under high load.

Good eyes - thank you for spotting it!

Hmm… this probably also explains the correlation to drops.
What is “the rollover option”? I’d like to understand further.
Is this still possible when using RSS and cluster_qm? (In that mode, I thought the balancing was done entirely by the NIC? Will suricata intentionally rebalance under certain scenarios?)

Thanks,
Jeff

Short version: rollover option does not exist, forget about it.

Long version: The rollover option was added to af_packet socket to allow kernel to drop packet because a socket queue is full. If the option is activated then kernel send packet to the next socket in the set if ever the one it should have been sent as a full queue. This prevent dropping BUT it means the packet will not reached the socket it was supposed to. On Suricata side, this means that it will reach the wrong thread and cause a mess.

Getting back to this; it sounds like you’re recommending disabling the rollover option.
I’d like to understand how this affects Suricata though. Can you elaborate on how it “causes a mess” in Suricata?
Is the processing of these packets heavier?
I’ve noticed that once we start to see Suricata drops, they often ramp up very heavily, so I’m wondering if this might play a role (i.e., if the handling of packets on the wrong thread is heavier, potentially it creates further backpressure on the queues, and makes it even more likely that further drops will occur?)

Thanks,
Jeff

Yes, you need to disable rollover indeed.

Check slide 11 here for an explanation on the chaos induced by rollover (and packet on wrong thread):

Wrong thread issue will cause accuracy of protocol analysis and detection to drop dramatically.

Makes sense.
Unfortunately, even with:

rollover: 'no'

I’m still seeing tcp.pkt_on_wrong_thread incrementing.
What’s the best way to debug how/why that is ocurring?

What does it look like percentage wise from the total ?

In looking into this, I found that the value rolls over fairy often. Note sure if this is expected:

tcp.pkt_on_wrong_thread                       | Total                     | 3250877
tcp.pkt_on_wrong_thread                       | Total                     | 4314104
tcp.pkt_on_wrong_thread                       | Total                     | 5015413
tcp.pkt_on_wrong_thread                       | Total                     | 5528283
tcp.pkt_on_wrong_thread                       | Total                     | 6035067
tcp.pkt_on_wrong_thread                       | Total                     | 6479016
tcp.pkt_on_wrong_thread                       | Total                     | 8135119
tcp.pkt_on_wrong_thread                       | Total                     | 9836991
tcp.pkt_on_wrong_thread                       | Total                     | 10709514
tcp.pkt_on_wrong_thread                       | Total                     | 11263319
tcp.pkt_on_wrong_thread                       | Total                     | 11748549
tcp.pkt_on_wrong_thread                       | Total                     | 12273358
tcp.pkt_on_wrong_thread                       | Total                     | 12465899
tcp.pkt_on_wrong_thread                       | Total                     | 92700

In any event, it seems like I’m seeing about 2.4% of packets listed as on the wrong thread, so not especially high, at least, I’m just confused by it, as the balancing is offloaded to the NIC… so I think that implies that suricata and the NIC may have a different notion of what a flow is in certain circumstances.

Can you please post the last full update in stats.log ? (not just the wrong thread counter)