High overhead from rs_dns_state_get_tx causing packet loss

applezip · February 22, 2021, 5:24pm

After first starting Suricata up, everything runs fine for a few hours but eventually I get (seemingly) unrecoverable packet loss.

Monitoring the processes with perf top, rs_dns_state_get_tx' and AppLayerDefaultGetTxIterator’ slowly creep up in overhead% and eventually overtake `DetectRun.part.16’. Once this happens, I start getting packet loss.

This was after 15hrs:
Samples: 2M of event ‘cycles’, 4000 Hz, Event count (approx.): 1171695080430 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
46.41% suricata [.] rs_dns_state_get_tx
28.55% suricata [.] AppLayerDefaultGetTxIterator
9.87% suricata [.] FlowGetProtoMapping
4.15% suricata [.] DetectRun.part.16
1.01% suricata [.] DetectEnginePktInspectionRun
0.91% suricata [.] DetectEngineInspectRulePacketMatches
0.73% suricata [.] rs_sip_state_get_tx

I have the stats output from the same time attached as text file. statsout.log (6.6 KB)

OS: CentOS8 stream
CPU: 2x Xeon E5-2699 v4 (88 HT cores)
RAM: 128GB
NIC: Napatech NE40E3-4
Data rate: 10gbps sustained (pushing bigFlows.pcap from tcpreplay.appneta.com through a packet broker to the napatech)

It’s slow to diagnose as it takes hours for the function to creep up to the top of the list in perf.

vjulien · February 23, 2021, 8:49am

What version of Suricata are you using?

applezip · February 23, 2021, 11:37am

Whoops, that seems like a pretty important detail.

Suricata 6.0.1, compiled from source. Build info attached. buildinfo.log (6.7 KB)

applezip · February 24, 2021, 1:23pm

I disabled the DNS parsers and have been running for 17hrs with average 0.2% packet loss, versus the average 7.2% packet loss over my last 15hr run with DNS parsers enabled. When the packet loss starts with DNS parsers, I see about 30% packet loss and I cannot recover until I kill the feed or restart Suricata.

I see that the master branch has some recent changes to src/app-layer-parser.c, including some transaction cleanup. I may try to merge those changes into the 6.0.1 build locally and see how it goes.

Andreas_Herz · February 27, 2021, 8:01pm

I can see that on some deployments as well, we will keep an eye on that. If you could test it with Suricata 5.0.5 that would help us to narrow it down to changes from 5 to 6.

applezip · March 2, 2021, 11:37am

After an 18hr run with 5.0.5, I have 0% packet loss. Same host, same settings, still 10gbps sustained.

vjulien · March 2, 2021, 12:26pm

We have some fixes in the just released 6.0.2 that might help. Are you able to try it out? (See Suricata 6.0.2 and 5.0.6 released)

applezip · March 3, 2021, 11:20am

I am still seeing the same issue with 6.0.2. Averaging 8.1% packet loss after 14 hours at 10gbps.

I’ll attach the main perf top and annotations for ‘rs_dns_state_get_tx’ and ‘AppLayerDefaultGetTxIterator’
602-1

vjulien · March 3, 2021, 3:42pm

Are you able to provide a perf top screenshot after running it with the -g option? I’d like to see if we can find out which path leads to these calls.

ish · March 3, 2021, 9:19pm

Are you willing to try a patch or 2? I’ve somewhat replicated this by crafting a misbehaving DNS client, but I’ve seen similar in the real world.

applezip · March 8, 2021, 4:01pm

Yes, I can try some local patches.

Here’s the updated output from perf:

ish · March 17, 2021, 5:47pm

I’ve found a few cases with TCP DNS where this can happen, in particular where there are DNS TCP streams that are long lived and messages may be lost, or the client floods the server (which I have seen on the real internet).

This patch should help with the issue, but we’re looking at better ways as well. Please let me know. If you know there is no TCP DNS in your traffic, this is unlikely to help.

https://github.com/jasonish/suricata/commit/ddb78e60de5a35f09548b6d93e55a57accfb4e05.patch

Thanks.

Topic		Replies	Views
Suricata 7.0.0 IPS AF_Packet+RSS huge drop in performance Help	10	931	August 30, 2023
Help with packets drop Help	10	80	October 5, 2024
Dpdk packet loss Help suricata , dpdk	8	692	February 26, 2024
High Packet Drop Rate with DPDK compared to AF_PACKET in Suricata 7.0.7 Help suricata , dpdk	8	199	October 28, 2024
Suricata 6.0.1 high packet loss Help	16	3963	March 4, 2021

High overhead from rs_dns_state_get_tx causing packet loss

Related topics