Reduce CPU and % drops

Andreas_Herz · January 18, 2024, 8:57am

Do you have the stats.log from the test without rules?

Can you also run the perf top command like last time in the scenario without rules?

If you run with rules, how many of those are ip-only (you can see in the suricata.log)

Could you still try to run the tshark command so that we can exclude this unidirectional case being an issue?

kilsergin · January 18, 2024, 9:03am

There is a theory. It seems to be a matter of rules. Checking

kilsergin · January 18, 2024, 9:57am

If we add our own rules, then this is the % of drops and the CPU is 99%. If the rules are ET, then everything works fine.

Our total number of rules is ~78000.

Tell me what server characteristics are needed for such tasks:

How many cores?
What processor?
What network card?
And other characteristics that can still be added

And another question: what should be revised in the rules with such drops - 34.67% libpcre2-8.so.0.11.0 [.] pcre2_match_8

Andreas_Herz · January 18, 2024, 10:04am

So if the system is running fine with just the ET rules it would be best to work on your custom rules.

You can use Rule Profiling and check for the impact like in 11.9. Rule Profiling — Suricata 8.0.0-dev documentation keep in mind that rule profiling being compiled and and enabled has an impact on performance, so don’t run that in production all the time. (Don’t mix it with the full profiling of packets which is a different profiling)

Also see 9.4. Rules Profiling — Suricata 8.0.0-dev documentation to control it.

Ideally you can narrow it down to a small amount of rules that have a big impact on performance and try to improve those rules. Without knowing the exact rules it’s hard to tell where to improve. But yes, one bad rule can kill performance.

Mellanox Cards should be fine in general, you won’t see a huge diff with other cards like Intel or Napatech if the signature is the root cause since this is burden on the CPU and less on the NIC.
Also adding more cores or more performant cores might not solve the root cause.

kilsergin · January 19, 2024, 7:01am

Let’s try profiling the grafts.

Can you suggest more precise settings at the operating system level for the Mellanox network card and for the CPU?

And also how can I make changes to the suricata.yaml file based on these settings?

kilsergin · January 24, 2024, 11:32am

We started profiling. A log has been created. Tell me how to analyze it?

rule_perf.log (47.0 KB)

We assigned the kernels to the network card. Have questions:

is it normal that all cores are loaded more than 90%?
11 core is constantly 100%, is this normal?

Andreas_Herz · February 28, 2024, 8:26pm

Well it depends on how much workload the cores have to do, looks like they reach their limits. In htop you can configure an option to also show a column that indicates which core is used by the process, so you could check if something stands out for the process(es) running on core 11.

kilsergin · April 24, 2024, 10:13am

Good afternoon! The situation was resolved by reducing the number of signatures. Thank you for your help!