I’m seeing some unexpected behaviour with the midstream flag.
As an example;
With Suricata deployed with stream.midstream: false, I’m receiving a stream of “X” http logs/second.
With Suricata deployed with stream.midstream: true, I’m receiving considerably less (on the order of 10% of X)
I have reproduced this multiple times with both Suricata 6.0.2 and 6.0.4.
This doesn’t make sense to me.
My understanding is that with midstream = false, Suricata will ignore TCP streams that it hasn’t seen the 3-way-handshake, and with midstream = true, it will attempt to inspect those flows.
Based on that description, setting midstream = true should, if anything, be providing more logs, not less (i…e, suricata should see all the flows it would have with midstream = false, but it will also inspect flows where it missed the handshake), correct?
I’m wondering if anyone can provide some insight into what might be happening and/or investigation steps?
In parallel, I’m going to attempt to trace through the source to see if I can find out where the disconnect is.
What does the memory usage (memcap/memuse counters) look like? It is conceivable that inspecting more leads to more memory use. Similar more work can lead to more pkt drops, which might hurt all the sessions.
This is exactly where my investigation has led me, as well
I apparently missed it, on the initial pass of stats, but there are non-zero tcp.segment_memcap_drop
We didn’t see this in Suricata 4 w/ midstream == true, so I guess Suricata 6 is generally consuming more of the tcp buffer per flow? I thought this memory space was largely used to tcp reassembly, though, so the main contributor to growth would be proportional to the traffic, and not affected by version of suricata (?).
I’m currently investigating different tweaks to these parameters to see whether I can avoid these drops.
On of the changes since suri 4 is that more protocols are capable of keeping track of things when there is packet loss, where previously a single lost data segment in HTTP for example would stop HTTP traffic for that session completely. The consequence is that in a scenario where we are overwhelmed we continue to use more resources than before.