I’m not so new to Suricata, but I have a problem loading rules. I have received more than 2 miles of rules to apply and the time it takes to process them is in hours. When I look at htop, only one core is fully utilized, but Suricata says it has 4 management threads.
I am already using suricata 5.0.2 and I have a server with 24 cores and 256GB of RAM for testing it.
You use 2 million rules? that’s not an amount of rules a typical setup would use, high ones range at 50.000 rules and those take 1-2minutes to load on high performance systems. Feel free to post your suricata config but if that’s the amount of rules you want to use I don’t think it will even perform after it has loaded.
The initial rule loading is single-threaded unfortunately. We also noticed this when running >200k rules (and updating them daily). I agree that datasets are the way to go if the rules are autogenerated from a secondary indicator set. Another option, in that case, would be to do downstream matching on fields in the EVE-JSON output.
This is probably what I needed to know, but I’m not happy with that. So probably the only way is datasets, right? And could you please be more specific with another option?
If your workflow relies on working with results you get directly from Suricata (e.g. by working with a file containing EVE-JSON) then datasets would be the best way to do it.
As downstream matching I understand looking at EVE-JSON metadata output and using an efficient matching approach to only focus on some of the fields in there. This avoids having to consider your large indicator set when looking at raw network data. We use FEVER (https://github.com/DCSO/fever) for that purpose – it will consume all EVE metadata and inject new alerts into the output event stream if, for example, a Bloom filter match for a domain in your indicator set occurred in the http.hostname, tls.sni or dns.rrname fields. This allows us to match large sets of indicators with acceptable specificity with no Suricata reload time and a little data to be pushed to the sensor when updating the set.
However, this requires a more elaborate setup since data would need to be passed around between Suricata, the downstream matcher and whatever you will be using to consume alerts from the sensors (Logstash etc).