Suricata 5.0.6 inline on RHEL dropping tls traffic with no alerts

Hi all,
Looking for some feedback on what I believe is an issue with Suricata unexpectedly dropping tls traffic. The company I work for uses Suricata in inline mode (only alert rules) with nfqueue/iptables (4 queues balanced with bypass) and fail2ban watching fast.log on our application server/appliance. Our application is essentially a webserver GUI that brokers external client/server connectivity using ssh tunneling.

Under normal application usage we haven’t historically seen issues with traffic interruptions or drops. However recently we’ve identified an issue with our API/SDK connections that seems to indicate dropping/blocking of traffic at the IDS. A persistent API/SDK connection that repeatedly touches our api endpoint will work fine for a limited period of time and then stop getting any responses. The eve.json logs show the initial ssl/tls connection attempts inbound which turn into drops shortly after under the same flow_id. This sequence of tls request and subsequent drops continue until Suricata is stopped or the API/SDK connection is hung up and we wait a small time to restart.

Unfortunately the eve.json logs show no alerts and the fast and alert-debug logs are empty. I verified the nfqueue isn’t getting overloaded, the fail2ban stats show no blocks, and other non ssl traffic from the same source IP can reach the server even when ssl/tls is blocked.

My planned next steps are to set the drop-invalid flag to false, and possibly disable all tls parsing just to see if that has any effect.

All in all we’re at a loss for what is causing the connection drop/hang and any advice on where to look would be greatly appreciated.

Test system stats:
CentOS Linux release 7.9.2009 (Core)
kernel: 3.10.0-1160.25.1.el7.x86_64
Suricata 5.0.6
suricata-update version 1.1.3 (rev: ac3ddb2)
iptables 1.4.21
fail2ban 0.11.1

Thanks,
– Gene Crumpler

Can you post the stats.log?

Suricata can also drop invalid/broken traffic without triggering a specific alert.
If, for some reason, Suricata is running into performance issues you can try the nfqueue bypass option.

We do have the nfqueue bypass option set. I’ve also now tested after disabling the app layer protocol parser for tls, as well as disabling the ‘drop-invalid’ stream engine setting. Neither had an impact on the drops we’re seeing. I’ve attached a stat log excerpt from a time when we were seeing consistent packet drops, though it is not the same timestamp as the filtered and sanitized eve.json I’ve also attached. I haven’t had a chance to sanitize the corresponding pcap file yet, but if it would be helpful in tracking this down, I can upload that as well.

If I’m reading the eve.json and packet captures correctly, when looking a narrow view of the drop events filtering by client side ip/port, it seems to show a valid tls connection is established and application data is passed back and forth for some period of time normally. Then a long delay occurs and the tls connection shows a closed/timeout from the client side. However at this point all server fin/ack responses sent are dropped by Suricata for some unknown reason. Then eventually the flow times out.

sanitized_suri_drops_7-28_eve.json (8.1 KB)
stats.log (180.5 KB)

Thanks so much for your feedback and assistance,
–GC

Small update, it looks like the drop-invalid setting was not correctly disabled. I’ve corrected this and we’re testing again to validate if the issue still exists.

In the stats.log I don’t see a drop but some ips.blocked which might be due to rules that trigger. Do you still see this issue?
It would also help to get some metrics on the system, so traffic spikes could be a reason or other load on the CPU.