So today I have modified a couple of templates that I’ve been using for a long time.
One was a alert http, using http_header_header buffer, and a simple content match. That was modified to match on http.host, and now using a pcre to reduce false-positives. The second one was an alert dns, with a simple content match. That was modified to now match on dns.query buffer, and same kind of pcre instead of content. The pcre looks like pcre:"/(?:^|.)FQDN$/i";, for both dns and http rules, where before was content: “FQDN”;.
These two templates are then used by a script that generates rules from a list of ~5000 FQDN’s, so in the end almost 10K rules are generated from that.
I tested the rules on a couple of test instances, and observed no issue. But after pushing this, I noticed that a significant number of my instances started segfaulting after reload or restart, like this:
W#10-eth0: segfault at 2100000021 ip 0000564a18378801 sp 00007fa
21ae87730 error 4 in suricata[564a181d1000+57b000]
Upon manual restart, I could see RSS rising and rising until they crashed. But interestingly not all instances, maybe 25% of them. I’m running 5.0.4 mostly, but have also observed the problem in some 5.0.5 instances I just upgraded to see if it would help. It does seem somehow related to traffic volume, as Idle or low-traffic instances don’t seem to segfault so far.
I still have some more troubleshooting to do to try to isolate this, but if you have any suggestion you can advance on what could be the issue here, it would be very much appreciated. The new rules are certainly more demanding than the previous simpler versions, but they’re not really esoteric, I think…
Thanks in advance.