Suricata memory stays high after rule reload with fewer Dataset entries

I first added a rule that uses dataset with 10M entries.

drop http any any → any any (msg:"test_rule"; http.host; dataset:isset, dataset-file, load dataset-file.lst, type string, hashsize 2500000; sid:123; rev:1;)

Note, I set single-hashsize: 2500000 in yaml file.

I see increase in memory which is expected. However, once I update the dataset file and only leave 1 entry in the dataset and reload rules, the memory still stays high.

  1. Does Suricata not cleanup dataset memory during rule reload? Any recommendation on what logs I can investigate the issue?
  2. How does dataset memory determined? For example if there are x entries in a dataset file and the hashsize is y, what would be the total additional memory used by dataset rules?

Could you please tell how are you checking that the memory increase is in fact because of datasets?

I am running suricata in a docker, so I am using docker statsbefore and after updating the dataset. Since, I am not sending any traffic and not doing anything else, I assume it’s coming from dataset.

I also notice that even after I reference a different dataset file and reload file, the memory doesn’t go down.

  1. rules referencing dataset1 with 100 entries, approx 800 MB
  2. updated dataset1 with 1 million entries and reload Suricata, approx 2.2 GB
  3. updated dataset 1 will 100 entries and reload Suricata, still takes approx 2.1 GB
  4. updated the rules to reference a new dataset2 with 100 entries and reload Suricata still takes approx 2.1 GB
  5. restart Suricata, now it takes approx 800 MB

thank you!

  1. Could you please share the exact dataset rule?
  2. Please note that the post reload cleanup does not happen right away after the reload. It gets picked up after all the other more important reload specific changes are taken care of. So, please check the memory usage after the notice message “rule reload complete“.