Tcp.pkt_on_wrong_thread

I’m confused by what this counter is saying.

I have a cluster_flow deployment of Suricata which sees this counter incrementing regularly, and it appears to be following a pattern proportional to kernel_drops, which I’ve been trying to remove (the drops are under what the box should be able to handle)

Because of this, I stumbled on this:

And opted to try and configure cluster_qm w/ rss on a different/test system using X710’s, to see how it behaves.
Prior to doing this, the system was configured with cluster_flow (fairly similarly to the production box which was dropping), and wasn’t really seeing any pkt_on_wrong_thread counters.
However, after following these steps:
https://suricata.readthedocs.io/en/suricata-5.0.3/performance/high-performance-config.html
The box now regularly sees pkt_on_wrong_thread incrementing… I feel like I must’ve done something wrong, but I believe I follow the steps correctly.

I thought, with cluster_qm, the balancing was done by the NIC, and so each worker thread would read from its own rss queue, fed by the NIC balancing via hw using a hash of 5-tuple… is this correct?

If so; how could a pkt be balanced to the wrong thread!?

On the first link above there’s concern over tunneling and how they’re balanced, but my traffic mix is entirely IPV4 w/ TCP, and UDP. No tunnels.

Thanks,
Jeff

Do you mind sharing details of the NIC config (for cluster_qm)/"ethtool -x " output /Suri version /Kernel level ?
In my test cases - the results were reversed, so it will be very interesting to see why/what’s the diff.

Indeed, seems opposite, so I’m confused. I wonder if I’m somehow seeing the inverse side of the balancing issue you were seeing?

Using Suricata 4.1.4 (rev b68e2e7) (will try with 6.0 soon)
Kernel: 4.15.0-74-generic

from suricata.yaml:
af-packet:
- block-size: 1048576
bypass: ‘yes’
checksum-checks: ‘no’
cluster-id: 99
cluster-type: cluster_flow
defrag: ‘yes’
interface: lan0
ring-size: 1500000
rollover: ‘yes’
threads: 12
tpacket-v3: ‘yes’
use-mmap: ‘yes’
xdp-mode: driver

$ ethtool -x lan0
RX flow hash indirection table for lan0 with 12 RX ring(s):
    0:      0     1     2     3     4     5     6     7
    8:      8     9    10    11     0     1     2     3
   16:      4     5     6     7     8     9    10    11
   24:      0     1     2     3     4     5     6     7
   32:      8     9    10    11     0     1     2     3
   40:      4     5     6     7     8     9    10    11
   48:      0     1     2     3     4     5     6     7
   56:      8     9    10    11     0     1     2     3
   64:      4     5     6     7     8     9    10    11
   72:      0     1     2     3     4     5     6     7
   80:      8     9    10    11     0     1     2     3
   88:      4     5     6     7     8     9    10    11
   96:      0     1     2     3     4     5     6     7
  104:      8     9    10    11     0     1     2     3
  112:      4     5     6     7     8     9    10    11
  120:      0     1     2     3     4     5     6     7
  128:      8     9    10    11     0     1     2     3
  136:      4     5     6     7     8     9    10    11
  144:      0     1     2     3     4     5     6     7
  152:      8     9    10    11     0     1     2     3
  160:      4     5     6     7     8     9    10    11
  168:      0     1     2     3     4     5     6     7
  176:      8     9    10    11     0     1     2     3
  184:      4     5     6     7     8     9    10    11
  192:      0     1     2     3     4     5     6     7
  200:      8     9    10    11     0     1     2     3
  208:      4     5     6     7     8     9    10    11
  216:      0     1     2     3     4     5     6     7
  224:      8     9    10    11     0     1     2     3
  232:      4     5     6     7     8     9    10    11
  240:      0     1     2     3     4     5     6     7
  248:      8     9    10    11     0     1     2     3
  256:      4     5     6     7     8     9    10    11
  264:      0     1     2     3     4     5     6     7
  272:      8     9    10    11     0     1     2     3
  280:      4     5     6     7     8     9    10    11
  288:      0     1     2     3     4     5     6     7
  296:      8     9    10    11     0     1     2     3
  304:      4     5     6     7     8     9    10    11
  312:      0     1     2     3     4     5     6     7
  320:      8     9    10    11     0     1     2     3
  328:      4     5     6     7     8     9    10    11
  336:      0     1     2     3     4     5     6     7
  344:      8     9    10    11     0     1     2     3
  352:      4     5     6     7     8     9    10    11
  360:      0     1     2     3     4     5     6     7
  368:      8     9    10    11     0     1     2     3
  376:      4     5     6     7     8     9    10    11
  384:      0     1     2     3     4     5     6     7
  392:      8     9    10    11     0     1     2     3
  400:      4     5     6     7     8     9    10    11
  408:      0     1     2     3     4     5     6     7
  416:      8     9    10    11     0     1     2     3
  424:      4     5     6     7     8     9    10    11
  432:      0     1     2     3     4     5     6     7
  440:      8     9    10    11     0     1     2     3
  448:      4     5     6     7     8     9    10    11
  456:      0     1     2     3     4     5     6     7
  464:      8     9    10    11     0     1     2     3
  472:      4     5     6     7     8     9    10    11
  480:      0     1     2     3     4     5     6     7
  488:      8     9    10    11     0     1     2     3
  496:      4     5     6     7     8     9    10    11
  504:      0     1     2     3     4     5     6     7
RSS hash key:
6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a:6d:5a
RSS hash function:
    toeplitz: on
    xor: off
    crc32: off

Can you try lowering the ring size to 200000? And test with 6.0 please if possible.

I still see tcp.pkt_on_wrong_thread when using Suricata 6.0.

After reducing the ring-size, for my test traffic, I saw the counter increment more (~ 1.5 - 1.7x more)

The rollover option is possibly contributing to this issue. It breaks the flow based load balancing when under high load.

Good eyes - thank you for spotting it!

Hmm… this probably also explains the correlation to drops.
What is “the rollover option”? I’d like to understand further.
Is this still possible when using RSS and cluster_qm? (In that mode, I thought the balancing was done entirely by the NIC? Will suricata intentionally rebalance under certain scenarios?)

Thanks,
Jeff

Short version: rollover option does not exist, forget about it.

Long version: The rollover option was added to af_packet socket to allow kernel to drop packet because a socket queue is full. If the option is activated then kernel send packet to the next socket in the set if ever the one it should have been sent as a full queue. This prevent dropping BUT it means the packet will not reached the socket it was supposed to. On Suricata side, this means that it will reach the wrong thread and cause a mess.

Getting back to this; it sounds like you’re recommending disabling the rollover option.
I’d like to understand how this affects Suricata though. Can you elaborate on how it “causes a mess” in Suricata?
Is the processing of these packets heavier?
I’ve noticed that once we start to see Suricata drops, they often ramp up very heavily, so I’m wondering if this might play a role (i.e., if the handling of packets on the wrong thread is heavier, potentially it creates further backpressure on the queues, and makes it even more likely that further drops will occur?)

Thanks,
Jeff

Yes, you need to disable rollover indeed.

Check slide 11 here for an explanation on the chaos induced by rollover (and packet on wrong thread):

Wrong thread issue will cause accuracy of protocol analysis and detection to drop dramatically.

Makes sense.
Unfortunately, even with:

rollover: 'no'

I’m still seeing tcp.pkt_on_wrong_thread incrementing.
What’s the best way to debug how/why that is ocurring?

What does it look like percentage wise from the total ?

In looking into this, I found that the value rolls over fairy often. Note sure if this is expected:

tcp.pkt_on_wrong_thread                       | Total                     | 3250877
tcp.pkt_on_wrong_thread                       | Total                     | 4314104
tcp.pkt_on_wrong_thread                       | Total                     | 5015413
tcp.pkt_on_wrong_thread                       | Total                     | 5528283
tcp.pkt_on_wrong_thread                       | Total                     | 6035067
tcp.pkt_on_wrong_thread                       | Total                     | 6479016
tcp.pkt_on_wrong_thread                       | Total                     | 8135119
tcp.pkt_on_wrong_thread                       | Total                     | 9836991
tcp.pkt_on_wrong_thread                       | Total                     | 10709514
tcp.pkt_on_wrong_thread                       | Total                     | 11263319
tcp.pkt_on_wrong_thread                       | Total                     | 11748549
tcp.pkt_on_wrong_thread                       | Total                     | 12273358
tcp.pkt_on_wrong_thread                       | Total                     | 12465899
tcp.pkt_on_wrong_thread                       | Total                     | 92700

In any event, it seems like I’m seeing about 2.4% of packets listed as on the wrong thread, so not especially high, at least, I’m just confused by it, as the balancing is offloaded to the NIC… so I think that implies that suricata and the NIC may have a different notion of what a flow is in certain circumstances.

Can you please post the last full update in stats.log ? (not just the wrong thread counter)

Can do:

Date: 10/30/2020 -- 14:24:13 (uptime: 0d, 18h 16m 12s)
------------------------------------------------------------------------------------
Counter                                       | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                        | Total                     | 506896627
capture.kernel_drops                          | Total                     | 44024312
decoder.pkts                                  | Total                     | 462872315
decoder.bytes                                 | Total                     | 346198739015
decoder.ipv4                                  | Total                     | 462872315
decoder.ethernet                              | Total                     | 462872315
decoder.tcp                                   | Total                     | 397101838
decoder.udp                                   | Total                     | 65770477
decoder.avg_pkt_size                          | Total                     | 747
decoder.max_pkt_size                          | Total                     | 1514
flow.tcp                                      | Total                     | 54555529
flow.udp                                      | Total                     | 6098484
flow.tcp_reuse                                | Total                     | 934803
flow.get_used                                 | Total                     | 6023770
flow.get_used_eval                            | Total                     | 1085091627
flow.get_used_eval_reject                     | Total                     | 798009403
flow.get_used_eval_busy                       | Total                     | 7452320
flow.get_used_failed                          | Total                     | 151989559
flow.wrk.spare_sync_avg                       | Total                     | 98
flow.wrk.spare_sync                           | Total                     | 476650
flow.wrk.spare_sync_incomplete                | Total                     | 13788
flow.wrk.spare_sync_empty                     | Total                     | 5542
flow.wrk.flows_evicted_needs_work             | Total                     | 681350
flow.wrk.flows_evicted_pkt_inject             | Total                     | 1359017
flow.wrk.flows_evicted                        | Total                     | 7986801
flow.wrk.flows_injected                       | Total                     | 566537
tcp.sessions                                  | Total                     | 3025688
tcp.syn                                       | Total                     | 3061601
tcp.synack                                    | Total                     | 3212453
tcp.pkt_on_wrong_thread                       | Total                     | 5693811
tcp.segment_memcap_drop                       | Total                     | 11221377
tcp.stream_depth_reached                      | Total                     | 429
tcp.reassembly_gap                            | Total                     | 8553836
tcp.overlap                                   | Total                     | 2261
tcp.insert_data_normal_fail                   | Total                     | 8277776
app_layer.flow.http                           | Total                     | 24953
app_layer.tx.http                             | Total                     | 47374
app_layer.flow.ftp                            | Total                     | 461
app_layer.tx.ftp                              | Total                     | 3258
app_layer.flow.smtp                           | Total                     | 863
app_layer.tx.smtp                             | Total                     | 1349
app_layer.flow.tls                            | Total                     | 1025
app_layer.flow.ssh                            | Total                     | 10906
app_layer.flow.smb                            | Total                     | 1144
app_layer.tx.smb                              | Total                     | 7177
app_layer.flow.failed_tcp                     | Total                     | 33162
app_layer.flow.dcerpc_udp                     | Total                     | 4
app_layer.flow.dns_udp                        | Total                     | 373879
app_layer.tx.dns_udp                          | Total                     | 661077
app_layer.flow.failed_udp                     | Total                     | 5724601
flow.mgr.full_hash_pass                       | Total                     | 3673
flow.spare                                    | Total                     | 221900
flow.emerg_mode_entered                       | Total                     | 5
flow.emerg_mode_over                          | Total                     | 5
flow.mgr.rows_maxlen                          | Total                     | 16
flow.mgr.flows_checked                        | Total                     | 192620501
flow.mgr.flows_notimeout                      | Total                     | 177794889
flow.mgr.flows_timeout                        | Total                     | 14825612
flow.mgr.flows_evicted                        | Total                     | 46643442
flow.mgr.flows_evicted_needs_work             | Total                     | 571411
tcp.memuse                                    | Total                     | 8246000
tcp.reassembly_memuse                         | Total                     | 101813796
http.memuse                                   | Total                     | 396324
ftp.memuse                                    | Total                     | 12
flow.memuse                                   | Total                     | 76818624

Assuming this is 6.0, can you try a run without xdp/bypass ?
When you make the run - do you mind sharing a screenshot of htop ?

I’m currently using the pre-built ubuntu packages for Suricata 6.0 which apparently don’t have xdp enabled:

$ /usr/bin/suricata --build-info | grep -i xdp
  XDP support:                             no

I’ve attached an htop ss, but with some non-suricata/proprietary processes blanked out.

There is an issue with fragmented packets, so what you could do is to try ethtool rx-flow-hash sd instead of ethtool rx-flow-hash sdfn on your NIC.

Sorry to revive this ancient thread, but I ran into the same issue for a completely different reason. Hopefully this helps someone searching for this problem in the future:

This can also be caused by duplicate configurations for the same interface in suricata.yaml – for example, if one changes the default interface to the primary interface.

1 Like