Hello,
I have just introduced the hyperscan support in my Suricata 7.0.0, but I don’t see any improvements. I tested it with few rules (i.e. “rules_loaded”:115) or more rules (“rules_loaded”: 35203) and I don’t see any change on performance compared to the same Suricata 7.0.0 without hyperscan support.
If build-info shows hyperscan it looks correct, you could run Suricata with -vvvv and check the log output as wlel if it mentions hyperscan.
How did you actually measure the performance impact and are you certain that it was not enabled before, since Hyperscan is enabled by default if available?
In additon to that Intel Atom C3338 is a 2 Core CPU without HT which is not a very fast one. So could be just the CPU being the bottleneck.
In the log is not mentioned the word “hyperscan”, but I can see Debug messages from “mpm-hs”, I don’t know if this enough to say Suricata is working with hyperscan.
I have tested 2 different Suricata on the same board, the first one compiled without hyperscan support: Hyperscan support: no
and I did the test with this Suricata before the hyperscan installation on the board.
I tested several UDP traffic profiles (fixed frame 1528, fixed frame 85, iMix 420, …) and in all cases the maximum throughput decreases when I load more rules (decreases drastically with 35203 loaded rules).
With more rules loaded (from 115 rules on) I can see the {W#01} and {W#02} threads taking the biggest part of CPU resource.
I know this behaviour is normal, but what looks strange to me is that nothing changes with or without hyperscan (neither better, neither worst).
Yes, of course I don’t expect great performance on this kind of board, but all things considered, I supposed that the bottleneck is the rules handling so I also supposed that introducing the hyperscan should help.
Additional info:
Suricata is configured in IPS mode with NFQ.
I have upgraded Suricata from 7.0.0 to 7.0.2 yesterday and I’m observing better performances (the maximum throughput increases around 25%), but there are still no differences with or without Hyperscan.
How does your suricata.yaml look like and how do you start it?
Especially given the amount of queues for the NFQ setup.
There is also a limit what the pure switch to Hyperscan can achieve, the bottleneck could be something else since adding more and more rules is always increasing the pressure on the engine.
It also depends on how the signatures look like, some benefit more and some benefit less from Hyperscan.
I took some times to answer because I wanted to repeat the test with Suricata 7.0.2, since this release is going much better compared to the 7.0.0, at least in my test bed.
In the attached file suricataIPS_HS.yaml you can see the suricata.yaml.
The Maximum throughput in the best (and unreal) condition (Fixed Frame length 1518)
35236 rules: ~160 Mbps
Rules downloaded from Open/ET source
I can achieve the same value without hyperscan support (setting mpm-algo=ac-ks; spm-algo=bm).
I can have something better with hyperscan only if I set the worker threads affinity on different CPUs (W#01 - CPU 0 ; W#02 - CPU 2), but I should verify better:
hs + hs: ~200 Mbps
ac-ks + bm: ~180 Mbps
Note:
I consider the maximum throughput when there are the first dropped packets, but the above values are theoretical (the cpu idle is 0%, packets flows without threat, only big packets 1512, stats and log disabled), in a real network scenario the maximum throughput would be much less.
I don’t have better values if I use more queues.
I can provide also the maximum throughput with less rules:
2 rules: ~ 780 Mbps suricata (ftp-events.rules)
115 rules: ~ 400 Mbps (decoder-events.rules)
but these 2 values are obtained in different condition: in order to achieve more than ~ 400 Mbps I need to enable dpdk (my Suricata instance doesn’t have the dpdk support) and I need to reserve 1 CPU to it, so there is just 1 core for Suricata.
Your CPU is not good enough to run Suricata in my opinion.
My CPU is ARM (4xA76 + 4xA55) and the throughput is about 941Mbps with Hyperscan while my another CPU is ARM (4xA55) and the throughput is about 500Mbps with Hyperscan.
More powerful CPU with more cores is better. Just for your reference.
Move to AF_PACKET it’s complicated in the particular router configuration I’m testing now.
htop and perf are not available on the board. I can provide the top output (top -b -H), you can see 2 different Suricata configuration examples with 35236 loaded rules in the attached files.
Suricata with Hyperscan + 1 queue + worker threads pinned on 2 different CPUs (at moment this is the best configuration):
topSuricataHS.log = top output
statsHS.log: stats.log
Suricata with Hyperscan + runmode=workers + 2 queues + worker threads pinned on 2 different CPUs:
yes, I don’t expect much more as well (and I could agree with Samiux: this CPU is not the best option to host Suricata, at least as IPS with the full ruleset), but since I can obtain almost the same throughput without Hyperscan, I supposed I should get higher values with Hyperscan.
Could it depends on the lack of AVX instructions in Intel Atom C3338?
How does the nftables config look like regarding the queues?
The pure top output is not enough, it won’t show the exact details where performance could be lost. I would recommend installing the perf tools to get a better picture.
It could be that some optimizations might not apply with the CPU but in general hyperscan should always improve the performance.
Thus I still suspect another bottleneck which we should see once we have more debugging infos (with perf).