Ruleset loading time

Luke · April 2, 2020, 10:42pm

Hi everyone,

I’m not so new to Suricata, but I have a problem loading rules. I have received more than 2 miles of rules to apply and the time it takes to process them is in hours. When I look at htop, only one core is fully utilized, but Suricata says it has 4 management threads.

I am already using suricata 5.0.2 and I have a server with 24 cores and 256GB of RAM for testing it.

Is there some tips how to speed up loading it?

Thank you in advance

Luke

Andreas_Herz · April 2, 2020, 10:50pm

You use 2 million rules? that’s not an amount of rules a typical setup would use, high ones range at 50.000 rules and those take 1-2minutes to load on high performance systems. Feel free to post your suricata config but if that’s the amount of rules you want to use I don’t think it will even perform after it has loaded.

syoc · April 3, 2020, 8:11am

If most of the 2M rules are just matching different domains in http, dns or tls traffic you might be able to leverage datasets https://suricata.readthedocs.io/en/latest/rules/datasets.html.
Have a look at ip-rep as well. https://suricata.readthedocs.io/en/latest/reputation/ipreputation/ip-reputation.html

satta · April 3, 2020, 11:07am

The initial rule loading is single-threaded unfortunately. We also noticed this when running >200k rules (and updating them daily). I agree that datasets are the way to go if the rules are autogenerated from a secondary indicator set. Another option, in that case, would be to do downstream matching on fields in the EVE-JSON output.

Luke · April 3, 2020, 8:48pm

Thank you for all replies guys

Andreas_Herz I know it’s not a typical amount of rules, but it works, also I’m not sure if all rules are performed. I’m sorry, but as a new user, I’m not able to upload my config file.

@syoc I also thought about datasets, but I never worked with them. Maybe some guide will be helpful because I have a vague idea of how to work with them and how to use them.

@satta This is probably what I needed to know, but I’m not happy with that. So probably the only way is datasets, right? And could you please be more specific with another option?

Andreas_Herz · April 3, 2020, 8:51pm

Can you try again? Or what filetype do you use? yaml/yml should both work.

Luke · April 3, 2020, 8:54pm

Still nothing, but I’m using yaml…

Sorry, new users can not upload attachments.

jae · April 3, 2020, 10:43pm

I agree on matching downstream on eve when requiring that many rules, your sensor will thank you!

satta · April 5, 2020, 8:05am

This is probably what I needed to know, but I’m not happy with that. So probably the only way is datasets, right? And could you please be more specific with another option?

If your workflow relies on working with results you get directly from Suricata (e.g. by working with a file containing EVE-JSON) then datasets would be the best way to do it.

As downstream matching I understand looking at EVE-JSON metadata output and using an efficient matching approach to only focus on some of the fields in there. This avoids having to consider your large indicator set when looking at raw network data. We use FEVER (GitHub - DCSO/fever: fast, extensible, versatile event router for Suricata's EVE-JSON format) for that purpose – it will consume all EVE metadata and inject new alerts into the output event stream if, for example, a Bloom filter match for a domain in your indicator set occurred in the http.hostname, tls.sni or dns.rrname fields. This allows us to match large sets of indicators with acceptable specificity with no Suricata reload time and a little data to be pushed to the sensor when updating the set.
However, this requires a more elaborate setup since data would need to be passed around between Suricata, the downstream matcher and whatever you will be using to consume alerts from the sensors (Logstash etc).

Luke · April 7, 2020, 9:52am

Thank you for explanation, I will look on it.

jmott · April 9, 2020, 4:37pm

Hi @Luke, @jmtaylor90 put together a nice guide on getting started with datasets that might be helpful: https://trex421.blogspot.com/2019/09/datasets-with-suricata.html

Luke · April 9, 2020, 4:48pm

thank you very much, this going to be helpful