I first added a rule that uses dataset with 10M entries.
drop http any any → any any (msg:"test_rule"; http.host; dataset:isset, dataset-file, load dataset-file.lst, type string, hashsize 2500000; sid:123; rev:1;)
Note, I set single-hashsize: 2500000 in yaml file.
I see increase in memory which is expected. However, once I update the dataset file and only leave 1 entry in the dataset and reload rules, the memory still stays high.
Does Suricata not cleanup dataset memory during rule reload? Any recommendation on what logs I can investigate the issue?
How does dataset memory determined? For example if there are x entries in a dataset file and the hashsize is y, what would be the total additional memory used by dataset rules?
I am running suricata in a docker, so I am using docker statsbefore and after updating the dataset. Since, I am not sending any traffic and not doing anything else, I assume it’s coming from dataset.
I also notice that even after I reference a different dataset file and reload file, the memory doesn’t go down.
rules referencing dataset1 with 100 entries, approx 800 MB
updated dataset1 with 1 million entries and reload Suricata, approx 2.2 GB
updated dataset 1 will 100 entries and reload Suricata, still takes approx 2.1 GB
updated the rules to reference a new dataset2 with 100 entries and reload Suricata still takes approx 2.1 GB
Please note that the post reload cleanup does not happen right away after the reload. It gets picked up after all the other more important reload specific changes are taken care of. So, please check the memory usage after the notice message “rule reload complete“.