Poor Performance When Using the 'flow' Keyword

We’re running Suricata Release 7.0.8 with DPDK, IPS mode. The NIC bandwidth is 25 Gbps, and my forwarding bandwidth is ~700Mbps under the following signatures:

Signature Configuration:

I’ve configured approximately 5,000 pure L4 rules. In our testing environment, the number of source IPs, destination IPs, and destination ports is limited. To simulate a scenario with numerous rules with different destination IPs, I’m using a large number of source ports. Here’s an example of our rules:

pass udp any 60000 -> 8.8.8.8 80 (msg:"acl pass udp port 60000 visit to 8.8.8.8 and port 80"; flow:to_server; sid:1000001; rev:1; priority:2;)

pass udp any 60001 -> 8.8.8.8 80 (msg:"acl pass udp port 60001 visit to 8.8.8.8 and port 80"; flow:to_server; sid:1000002; rev:1; priority:2;)

...

pass udp any 65535 -> 8.8.8.8 80 (msg:"acl pass udp port 65535 visit to 8.8.8.8 and port 80"; flow:to_server; sid:1005536; rev:1; priority:2;)

drop ip any any -> any any (msg:"acl drop by default"; flow:to_server; sid: 1000000; rev:1; priority:5;)

Issue:

I’ve noticed that using the ‘flow: to_server’ keyword significantly degrades forwarding performance. I understand that using the flow keyword causes rules to be treated as non-IP-Only rules (as discussed here). However, AWS recommends using the flow keyword in rules, and our use case requires the flow keyword for certain purposes. I’m looking for ways to optimize performance while retaining the flow keyword.

Analysis:

I generated a flame graph using perf, which shows the DetectRulePacketRules function consuming almost all CPU time, with DetectPortLookupGroup taking up the majority of that. Could someone explain why this function’s share is so high? The function itself appears to be quite simple.

Hi!

Thanks for the report. Is it possible for you to share the ruleset file and the following setting in your suricata.yaml?

detect:
  profile: medium
  custom-values:
    toclient-groups: 3
    toserver-groups: 25

Sure. This is my detect section in suricata.yaml:

detect:
  profile: medium
  custom-values:
    toclient-groups: 3
    toserver-groups: 25
  sgh-mpm-context: auto
  inspection-recursion-limit: 3000
  # If set to yes, the loading of signatures will be made after the capture
  # is started. This will limit the downtime in IPS mode.
  #delayed-detect: yes

and this is my ruleset file

suricata.rules (719.1 KB)