After first starting Suricata up, everything runs fine for a few hours but eventually I get (seemingly) unrecoverable packet loss.
Monitoring the processes with perf top, rs_dns_state_get_tx' and AppLayerDefaultGetTxIterator’ slowly creep up in overhead% and eventually overtake `DetectRun.part.16’. Once this happens, I start getting packet loss.
I disabled the DNS parsers and have been running for 17hrs with average 0.2% packet loss, versus the average 7.2% packet loss over my last 15hr run with DNS parsers enabled. When the packet loss starts with DNS parsers, I see about 30% packet loss and I cannot recover until I kill the feed or restart Suricata.
I see that the master branch has some recent changes to src/app-layer-parser.c, including some transaction cleanup. I may try to merge those changes into the 6.0.1 build locally and see how it goes.
I can see that on some deployments as well, we will keep an eye on that. If you could test it with Suricata 5.0.5 that would help us to narrow it down to changes from 5 to 6.
I’ve found a few cases with TCP DNS where this can happen, in particular where there are DNS TCP streams that are long lived and messages may be lost, or the client floods the server (which I have seen on the real internet).
This patch should help with the issue, but we’re looking at better ways as well. Please let me know. If you know there is no TCP DNS in your traffic, this is unlikely to help.