NFQ offload Performance implementation

Hi, I want to implement the NFQ offload mechanism describes in the following article:

I’m interesting in the NFQ section with offloading mechanism.
I implemented the nftable rules, I set “disable-offloading: false”.
Then I run the iperf test.
When I look at the nft table:
I don’t see, in chain ips, any packet filter by “meta mark 0x00000001 accept”.
How I can confirm that the mechanism works well or not ?

suricata-6.0.10$ sudo nft list ruleset
table ip filter {
        chain input {
                type filter hook input priority filter; policy accept;
                counter packets 322501 bytes 13657915295
                ip daddr 192.168.1.1-192.168.1.250 counter packets 0 bytes 0
                ip daddr 192.168.1.1-192.168.1.250 accept
                counter packets 322501 bytes 13657915295
        }

        chain ips {
                type filter hook input priority filter + 10; policy accept;
                counter packets 322501 bytes 13657915295
                meta mark set ct mark
                meta mark 0x00000001 accept
                counter packets 322501 bytes 13657915295
                queue to 0
                counter packets 0 bytes 0
        }

        chain conn_mark_save {
                type filter hook input priority 20; policy accept;
                counter packets 322473 bytes 13657910220
                ct mark set meta mark
                counter packets 322473 bytes 13657910220
        }
}

What version are you using, how do you run it and how does the config look like?

Suricata 6.10,
Ubuntu 22.04
$ ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --enable-nfqueue --enable-lua
only change in suricata.yaml
disable-offloading: false
suricata-6.0.10$ sudo ./src/.libs/suricata -c /suricata.yaml -q 0
5/5/2023 – 23:37:46 - - This is Suricata version 6.0.10 RELEASE running in SYSTEM mode
5/5/2023 – 23:37:51 - - all 18 packet processing threads, 4 management threads initialized, engine started.

Did you also update the nfq part in the config?

nfq:
        mode: accept
        bypass-mark: 1
        bypass-mask: 1

Thanks,
I added and run it again. ( I thought the defaults were those specified in comments, It’s goog to know that is not the case.)
I also added a printf in
static int NFQBypassCallback(Packet *p)
{
SCLogError(SC_BENOIT_INFO, “NFQBypassCallback”);
But I never see it. (If I put a log in NFQCallBack, I see it)

To get some packets (few less than 6 when running iperf3 test) bypass, I set
stream.bypass: yes
stream.reassembly.depth: 128
Then I got some bypass packet.
But when I run the iperf3 test like the one in the article, I got any improvement performance.

How do you measure the performance?

I used iperf3 to measure performance.
iperf3 -c localhost -l=4M
sudo ./src/.libs/suricata -c _Notes/suricataBenoit.yaml -q 0
I got new results:
With stream:reassembly:depth=48
[ 5 ] 0.00 - 10.04 sec 12.5 GBytes 10.7 Gbits/sec receiver
and
With stream:reassembly:depth=16
[ 5 ] 0.00 - 10.04 sec 29.4 GBytes 25.1 Gbits/sec receiver

I’m not sure about the stream:reassembly:depth meaning.