Poor Performance When Using the 'flow' Keyword

JTsai · February 24, 2025, 8:21am

We’re running Suricata Release 7.0.8 with DPDK, IPS mode. The NIC bandwidth is 25 Gbps, and my forwarding bandwidth is ~700Mbps under the following signatures:

Signature Configuration:

I’ve configured approximately 5,000 pure L4 rules. In our testing environment, the number of source IPs, destination IPs, and destination ports is limited. To simulate a scenario with numerous rules with different destination IPs, I’m using a large number of source ports. Here’s an example of our rules:

pass udp any 60000 -> 8.8.8.8 80 (msg:"acl pass udp port 60000 visit to 8.8.8.8 and port 80"; flow:to_server; sid:1000001; rev:1; priority:2;)

pass udp any 60001 -> 8.8.8.8 80 (msg:"acl pass udp port 60001 visit to 8.8.8.8 and port 80"; flow:to_server; sid:1000002; rev:1; priority:2;)

...

pass udp any 65535 -> 8.8.8.8 80 (msg:"acl pass udp port 65535 visit to 8.8.8.8 and port 80"; flow:to_server; sid:1005536; rev:1; priority:2;)

drop ip any any -> any any (msg:"acl drop by default"; flow:to_server; sid: 1000000; rev:1; priority:5;)

Issue:

I’ve noticed that using the ‘flow: to_server’ keyword significantly degrades forwarding performance. I understand that using the flow keyword causes rules to be treated as non-IP-Only rules (as discussed here). However, AWS recommends using the flow keyword in rules, and our use case requires the flow keyword for certain purposes. I’m looking for ways to optimize performance while retaining the flow keyword.

Analysis:

I generated a flame graph using perf, which shows the DetectRulePacketRules function consuming almost all CPU time, with DetectPortLookupGroup taking up the majority of that. Could someone explain why this function’s share is so high? The function itself appears to be quite simple.

sbhardwaj · February 24, 2025, 10:52am

Hi!

Thanks for the report. Is it possible for you to share the ruleset file and the following setting in your suricata.yaml?

detect:
  profile: medium
  custom-values:
    toclient-groups: 3
    toserver-groups: 25

JTsai · February 24, 2025, 11:04am

Sure. This is my detect section in suricata.yaml:

detect:
  profile: medium
  custom-values:
    toclient-groups: 3
    toserver-groups: 25
  sgh-mpm-context: auto
  inspection-recursion-limit: 3000
  # If set to yes, the loading of signatures will be made after the capture
  # is started. This will limit the downtime in IPS mode.
  #delayed-detect: yes

and this is my ruleset file

suricata.rules (719.1 KB)

sbhardwaj · March 8, 2025, 4:45pm

Hi, there!
Apologies for getting back late on this. The issue you’re seeing is valid indeed and has an open ticket already here: Bug #3771: Extreme performance degradation when doing IP-only rules with flow-keyword - Suricata - Open Information Security Foundation
However, no work has been started on this yet. We’ll try to add it to our roadmap soon.

Thank you for your report!