I first added a rule that uses dataset with 10M entries.
drop http any any → any any (msg:"test_rule"; http.host; dataset:isset, dataset-file, load dataset-file.lst, type string, hashsize 2500000; sid:123; rev:1;)
Note, I set single-hashsize: 2500000 in yaml file.
I see increase in memory which is expected. However, once I update the dataset file and only leave 1 entry in the dataset and reload rules, the memory still stays high.
Does Suricata not cleanup dataset memory during rule reload? Any recommendation on what logs I can investigate the issue?
How does dataset memory determined? For example if there are x entries in a dataset file and the hashsize is y, what would be the total additional memory used by dataset rules?
I am running suricata in a docker, so I am using docker statsbefore and after updating the dataset. Since, I am not sending any traffic and not doing anything else, I assume it’s coming from dataset.
I also notice that even after I reference a different dataset file and reload file, the memory doesn’t go down.
rules referencing dataset1 with 100 entries, approx 800 MB
updated dataset1 with 1 million entries and reload Suricata, approx 2.2 GB
updated dataset 1 will 100 entries and reload Suricata, still takes approx 2.1 GB
updated the rules to reference a new dataset2 with 100 entries and reload Suricata still takes approx 2.1 GB