Suricata inline mode - works so well and then crashes

Hi All,

I got suricata running on pfsense in inline mode on my LAN but it crashes after several hours or when I do certain things. I have tested so many ways and tweaked so many settings and I’m still having trouble figuring out what exactly causes it to crash. This may be multiple issues at the same time. It’s hard to narrow down so all I can do is provide all the information for the various ways in which it crashed.

The machine is a lenovo with dual xeon e-2637 cpus, 32GB of ECC ram. Internet is 1.5Gbps fiber. I’ve tried a few different NIC’s

  • some cheaper amazon 2.5Gbe dual NICs with I225-V chips
  • a $250 dual 10Gbe NIC with intel X540-T2 chips
  • the onboard NIC which shows up as em(0)

I thought it worked better with the X540 NIC and I got my hopes up that it was working but then it crashed again too

  • I increased all the memcaps
  • disabled offloading
  • Hyperscan algo
  • tried with zero rules running
  • increased some other values I forget exactly what - buffer size, packet size etc. I tried increasing each value substantially while testing it, one change at a time.

If it doesnt crash on its own it often crashes when running a speed test. Previously it looked like it would only crash when running a speed test from a PC using a usb network adapter. Then later I saw it crash from another PC as well, even though it wasn’t crashing before using that same test scenario.

When it crashes usually I can get to the pfsense from another interface and just restart the suricata service. Other times that won’t work and I have to restart either the dhcpd or the unbound (dns resolver) before it will work again.

I’ve selected all the logging options I could find in suricata settings and I’ve got a syslog server on LAN while I was running suricata on OPT1 so that when it crashes the OPT1 interface hopefully the logs would still go through to LAN. It looks like the logs were still getting through but I couldn’t find any unusual messages besides just some invalid rule errors etc.

I also noticed it is much easier to crash when my WAN is set to PPPoE. With my provider I can’t do true bridged mode / DHCP so I have to use PPPoE. Then I tried using a 2nd pfsense router - the parent router connects over PPPoE and then serves the child router DHCP WAN i.e. double nat and then I would test suricata on the child router. For a while it looked like that was working but then surprise surprise it crashed again just when I was starting to think PPPoE was the problem.

It’s hard to even find anything out of the ordinary in the log files. At one point when using the previous NIC’s (the amazon cheapy I225V) it was showing netmap fail - head kring blah blah blah, but that message doesn’t come up with the more expensive x540 NIC, it doesn’t seem to give any message before just crashing silently.

So in summary, even though I don’t know anything about networking, just testing and troubleshooting has led me to have strong suspicions about PPPoE, about cheap NIC’s, about the USB network adapter on the client PC that was running the speed test, but as I say it hasn’t been consistent and I’ve found it to crash under all different situations.

At this point I’m willing to buy a new machine if anyone could recommend a perfectly suited machine out of the box that can run pfsense/suricata in inline mode and achieve 1.5Gbps throughput. I had no problem getting those speeds with suricata in inline mode on the current hardware. It workes perfectly when it works. CPU only goes to about 20%, RAM only goes to about 15%, it was successfully dropping packets and everything. Then just dead. So frustrating. Should I just keep testing with different NICs and keep returning the ones that dont work to amazon?

edit - one other thing I was wondering - think it will work better on windows? I could load windows server on that machine instead, maybe the drivers work better? have you found that suricata is just as stable on windows with your setup? I thought I would ask before I open that can of worms

When you say “it crashes”, can you be a bit more specific. Does the interface “stall” and stop passing traffic? You say nothing of note is being logged when the event occurs.

Also, can you provide the pfSense version you are running? I am the Suricata package maintainer for pfSense, and this is the first report I’ve heard of this behavior on pfSense. However, several OPNsense users of Suricata with Inline IPS Mode have experienced similar stalling issues with netmap operation. I am wondering if there is a common FreeBSD issue that may be the underlying cause.

The OPNsense issue I referred to is being tracked upstream on this Suricata Redmine Issue: Bug #5744: netmap: 6.0.9 v14 backport causes known packet stalls from v14 implementation in "legacy" mode too - Suricata - Open Information Security Foundation.

Hey Bill thanks so much for your reply, I’m excited you’re here to look into it!

Yeah pretty much the interface doesn’t seem to pass anything, can’t reach the pfsense or internet, and when disconnecting/reconnecting a PC it sometimes can’t get a DHCP lease.

On other LAN interfaces besides the crashed one, they continue to function normally. I can login to the pfsense that way, and most of the time I just have to restart the suricata service. I had noticed that going into Suricata interface settings and stopping the service on that interface might not fix it, I would have to stop the service under the Status > Services menu.

This might be a fluke but a few times (not really all that common) it seems I had to restart the dhcpd or unbound service. It may be just getting jammed up because of my own actions - when it would crash, I might go and switch around all the cables to different interfaces, different machines, etc. and there are alot of devices talking.

So previously when using the I225-V NIC’s I did see this error appear,
nm_rxsync_prologue igc2 RX0: fail ‘head < kring->nr_hwcur && head > kring->nr_hwtail’ h 882 c 882 t 880 rh 882 rc 882 rt 880 hc 921 ht 880

but I haven’t seen any errors like this with the X540-T2, however that NIC still had the same stalling issue. However, even when I put the I225-V cards back in and continued testing, I didn’t see that same error anymore. Now it just doesn’t give any log messages. I do have the syslog server on LAN while testing suricata on OPT1, so when OPT1 stalls it can still send logs over LAN. But will the logs related to the stalled OPT1 interface make it through? I’ll do some further testing later.

I ordered a couple RTL8111 NIC’s that will come in a few days so I’ll test those out and let you know what else I can see in the logs.

Here’s my version:

2.6.0-RELEASE (amd64)
built on Mon Jan 31 19:57:53 UTC 2022
FreeBSD 12.3-STABLE

OK I tested those other two realtek cards and same problem.

Further verified when I quickly switch my PC’s connection from one interface to another it seems to mess with the unbound service so I think that might be unrelated.

When running speed test and interface becomes unresponsive, on the NIC itself it still shows LINK but no flashing ACT light. Looked at pfsense system log and suricata log and can’t see anything. I’ll post a bit of each.

However my suricata log does have some warnings/errors related to rules when starting up, this shouldn’t be a problem right? As mentioned I can disable all the rules (even built in suricata rules) and the problem still occurs with no rules set.

Here’s some of my system log. Log was cleared, then speed test webpage was loaded at around 18:50. Note the DHCP and ARP errors, I don’t usually see those, I think those are because just before this I was switching the cable to a different interface. I ran the speed test at 22:19. No message at all when it stalled out. After that I disconnect from OPT2 and connect to LAN.

Dec 24 22:21:07 php-fpm 97667 /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.1.1 )
Dec 24 22:21:06 kernel re2: link state changed to UP
Dec 24 22:21:06 check_reload_status 420 Linkup starting re2
Dec 24 22:21:05 check_reload_status 420 Reloading filter
Dec 24 22:21:05 php-fpm 381 /rc.linkup: Hotplug event detected for OPT2(opt2) static IP (192.168.6.1 )
Dec 24 22:21:04 kernel re0: link state changed to DOWN
Dec 24 22:21:04 check_reload_status 420 Linkup starting re0
Dec 24 22:18:55 dhcpleases 43615 Could not deliver signal HUP to process 13848: No such process.
Dec 24 22:18:51 php-fpm 382 /status_logs_settings.php: The command ‘/usr/sbin/arp -s ‘192.168.1.41’ ‘e4:5f:01:2b:8e:51’’ returned exit code ‘1’, the output was ‘arp: writing to routing socket: Cannot allocate memory’
Dec 24 22:18:50 syslogd kernel boot file is /boot/kernel/kernel

Here is the suricata log. This is at a different time. But after stalling the interface, I connected to a different interface to check the suricata log. This is all it showed, all of this was during suricata startup and after it says engine started then nothing else logged for the crash.

24/12/2022 – 22:22:57 - – This is Suricata version 6.0.4 RELEASE running in SYSTEM mode
24/12/2022 – 22:22:57 - – CPUs/cores online: 16
24/12/2022 – 22:22:57 - – HTTP memcap: 671088640
24/12/2022 – 22:22:57 - – Netmap: Setting IPS mode
24/12/2022 – 22:22:57 - – fast output device (regular) initialized: alerts.log
24/12/2022 – 22:22:57 - – http-log output device (regular) initialized: http.log
24/12/2022 – 22:22:57 - – stats output device (regular) initialized: stats.log
24/12/2022 – 22:22:57 - – Syslog output initialized
24/12/2022 – 22:23:07 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - previous keyword has a fast_pattern:only; set. Can’t have relative keywords around a fast_pattern only content
24/12/2022 – 22:23:07 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $EXTERNAL_NET $HTTP_PORTS → $HOME_NET any (msg:“MALWARE-OTHER Win.Trojan.Zeus Spam 2013 dated zip/exe HTTP Response - potential malware download”; flow:to_client,established; content:”-2013.zip|0D 0A|“; fast_pattern:only; content:”-2013.zip|0D 0A|“; http_header; content:”-“; within:1; distance:-14; http_header; file_data; content:”-2013.exe"; content:“-”; within:1; distance:-14; metadata:impact_flag red, policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, ruleset community, service http; reference:url,VirusTotal; classtype:trojan-activity; sid:26470; rev:2;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 31620
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - depth or urilen 11 smaller than content len 17
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $HOME_NET any → $EXTERNAL_NET $HTTP_PORTS (msg:“MALWARE-CNC Win.Trojan.Scranos variant outbound connection”; flow:to_server,established; content:”/fb/apk/index.php"; fast_pattern:only; http_uri; urilen:<10; metadata:impact_flag red, policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:url,VirusTotal; classtype:trojan-activity; sid:50525; rev:1;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 37242
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - “http_header” keyword seen with a sticky buffer still set. Reset sticky buffer with pkt_data before using the modifier.
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $EXTERNAL_NET $HTTP_PORTS → $HOME_NET any (msg:“MALWARE-CNC Osx.Trojan.Janicab runtime traffic detected”; flow:to_client,established; file_data; content:“content=|22|just something i made up for fun, check out my website at”; fast_pattern:only; content:“X-YouTube-Other-Cookies:”; nocase; http_header; metadata:impact_flag red, policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2012-0158; reference:url,VirusTotal; classtype:trojan-activity; sid:27544; rev:3;)” from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 37834
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - previous keyword has a fast_pattern:only; set. Can’t have relative keywords around a fast_pattern only content
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $HOME_NET any → $EXTERNAL_NET $HTTP_PORTS (msg:“MALWARE-CNC Win.Trojan.IcedId outbound connection”; flow:to_server,established; content:“Cookie: __gads”; fast_pattern:only; content:”__gads=“; http_cookie; content:”|3B| _gat=“; distance:0; http_cookie; content:”|3B| _ga=“; distance:0; http_cookie; content:”|3B| _u=“; distance:0; http_cookie; content:”|3B| __io=“; distance:0; http_cookie; content:”|3B| _gid=“; distance:0; http_cookie; metadata:impact_flag red, policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:url,VirusTotal; classtype:trojan-activity; sid:58835; rev:1;)” from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 38042
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_UNKNOWN_REGEX_MOD(131)] - unknown regex modifier ‘K’
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $HOME_NET any → $EXTERNAL_NET $HTTP_PORTS (msg:“MALWARE-CNC Win.Backdoor.TreeTrunk outbound connection”; flow:to_server,established; urilen:10; content:”/index.jsp"; fast_pattern:only; http_uri; pcre:“/^([0-9A-F]{2}-){5}[0-9A-F]{2}$/K”; metadata:impact_flag red, policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:url,VirusTotal; classtype:trojan-activity; sid:60270; rev:1;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 38109
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - “http_client_body” keyword seen with a sticky buffer still set. Reset sticky buffer with pkt_data before using the modifier.
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $HOME_NET any → $EXTERNAL_NET $HTTP_PORTS (msg:“MALWARE-CNC Win.Trojan.HannabiGrabber info stealer outbound communication”; flow:to_server,established; file_data; content:“Hannabi Grabber”; fast_pattern:only; http_client_body; content:”```fix|5C|nPCName:“; http_client_body; content:“GB|5C|nAntivirus:”; within:1000; http_client_body; metadata:impact_flag red, policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:url,VirusTotal; classtype:trojan-activity; sid:60728; rev:1;)” from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 38173
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_NO_FILES_FOR_PROTOCOL(285)] - protocol tls doesn’t support file matching
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $HOME_NET any → $EXTERNAL_NET 443 (msg:“PUA-OTHER Authedmine TLS client hello attempt”; flow:to_server,established; file_data; ssl_state:client_hello; content:“authedmine.com”; fast_pattern:only; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; classtype:misc-attack; sid:45952; rev:2;)” from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 39687
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - Can’t use file_data with flow:to_server or flow:from_client with http.
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $EXTERNAL_NET any → $HOME_NET 8500 (msg:“SERVER-OTHER Hashicorp Consul services API remote code execution attempt”; flow:to_server,established; content:”/v1/agent/service/register"; fast_pattern:only; http_uri; content:“PUT”; http_method; file_data; content:“check”; content:“script”; within:25; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:url,Hashicorp Consul Remote Command Execution via Services API; classtype:attempted-admin; sid:49670; rev:2;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 39783
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - “http_uri” keyword seen with a sticky buffer still set. Reset sticky buffer with pkt_data before using the modifier.
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $EXTERNAL_NET any → $HOME_NET $HTTP_PORTS (msg:“SERVER-OTHER VMWare vSphere log4shell exploit attempt”; flow:to_server,established; content:“Content-Disposition”; nocase; http_client_body; content:“RelyingPartyEntityId”; distance:0; nocase; http_client_body; content:”|0D 0A 0D 0A|“; distance:0; http_client_body; base64_decode:bytes 64,relative; base64_data; pcre:”/\x24\x7b(jndi|[^\x7d]?\x24\x7b[^\x7d]?\x3a[^\x7d]?\x7d)/i"; content:“/websso/SAML2/SSOSSL/”; fast_pattern:only; http_uri; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2021-44228; reference:cve,2021-44832; reference:cve,2021-45046; reference:cve,2021-45105; classtype:attempted-user; sid:58812; rev:3;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 39951
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_RULE_KEYWORD_UNKNOWN(102)] - unknown rule keyword ‘http_raw_cookie’.
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $EXTERNAL_NET any → $HOME_NET $HTTP_PORTS (msg:“SERVER-WEBAPP Multiple products DVR admin password leak attempt”; flow:to_server,established; content:”/device.rsp"; fast_pattern:only; http_uri; content:“uid=”; http_raw_cookie; content:“cmd=list”; http_client_body; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2018-9995; classtype:web-application-attack; sid:55839; rev:1;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 39991
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - depth or urilen 4 smaller than content len 10
24/12/2022 – 22:23:08 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $EXTERNAL_NET any → $HOME_NET $HTTP_PORTS (msg:“SERVER-WEBAPP Grandstream UCM6202 series SQL injection attempt”; flow:to_server,established; content:“user_name=”; fast_pattern:only; http_uri; urilen:4; content:”/cgi"; nocase; http_uri; pcre:"/[?&]user_name=[^&]
?([\x27\x22\x3b\x23\x28]|\x2f\x2a|\x2d\x2d)/Ui"; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2020-5722; classtype:web-application-attack; sid:53858; rev:2;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 40019
24/12/2022 – 22:23:09 - – [ERRCODE: SC_ERR_RULE_KEYWORD_UNKNOWN(102)] - unknown rule keyword ‘http_raw_cookie’.
24/12/2022 – 22:23:09 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $EXTERNAL_NET any → $HOME_NET $HTTP_PORTS (msg:“SERVER-WEBAPP Multiple products DVR admin password leak attempt”; flow:to_server,established; content:”/device.rsp"; fast_pattern:only; http_uri; content:“uid=”; http_raw_cookie; content:“cmd=list”; http_uri; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2018-9995; classtype:web-application-attack; sid:46825; rev:2;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 40116
24/12/2022 – 22:23:09 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - Can’t use file_data with flow:to_server or flow:from_client with http.
24/12/2022 – 22:23:09 - – [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature “drop tcp $EXTERNAL_NET any → $HOME_NET $FILE_DATA_PORTS (msg:“SERVER-WEBAPP Pulse Connect Secure template injection attempt”; flow:to_server,established; content:”/dana-admin/auth/custompage.cgi"; fast_pattern:only; http_uri; file_data; content:“LoginPage.thtml”; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service ftp-data, service http, service imap, service pop3; reference:cve,2020-8243; reference:url,Ivanti Community; classtype:attempted-admin; sid:57452; rev:1;)" from file /usr/local/etc/suricata/suricata_47420_re0/rules/suricata.rules at line 40366
24/12/2022 – 22:23:09 - – 2 rule files processed. 40932 rules successfully loaded, 13 rules failed
24/12/2022 – 22:23:09 - – Threshold config parsed: 0 rule(s) found
24/12/2022 – 22:23:09 - – 40935 signatures processed. 2097 are IP-only rules, 4933 are inspecting packet payload, 27964 inspect application layer, 108 are decoder event only
24/12/2022 – 22:23:09 - – [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit ‘file.zip&file.silverlight’ is checked but not set. Checked in 28582 and 2 other sigs
24/12/2022 – 22:23:09 - – [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit ‘file.pdf&file.ttf’ is checked but not set. Checked in 28585 and 1 other sigs
24/12/2022 – 22:23:09 - – [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit ‘file.xls&file.ole’ is checked but not set. Checked in 30990 and 1 other sigs
24/12/2022 – 22:24:13 - – Using 2 live device(s).
24/12/2022 – 22:24:13 - – devname [fd: 8] netmap:re0/R re0 opened
24/12/2022 – 22:24:13 - – devname [fd: 11] netmap:re0^/T re0^ opened
24/12/2022 – 22:24:13 - – devname [fd: 12] netmap:re0^/R re0^ opened
24/12/2022 – 22:24:13 - – devname [fd: 13] netmap:re0/T re0 opened
24/12/2022 – 22:24:13 - – all 18 packet processing threads, 4 management threads initialized, engine started.

Not surprised you see the same issue with Realtek. I think this issue is related to whatever the underlying cause is for the Suricata bug report I linked in my earlier post. Something is experiencing a hard lock, that’s why there is nothing logged. Yours is the first report I’ve seen about this on pfSense, though, and pfSense has been using the Netmap v14 API since early August of 2021. I will continue to monitor the linked Suricata Redmine Issue and implement whatever fix is produced from that.

Your unbound issues are unrelated to the specific Suricata problem. Unbound really does not like anything happening to an interface it is bound to. When the interface cycles for any reason (change of link state, netmap bringing the interface down and back up as it starts, etc.), unbound will usually bail.

You are also correct that your rule errors logged during initial startup are not related to the hang condition. Those do indicate real syntax issues with some rules, but they would not cause the hang. Suricata is simply discarding those rules during initialization and will not load them into the active rule set in memory.

Cool, thanks for clarifying. I’ll keep an eye on that other thread and at least now I can stop frantically messing around with settings trying to get it to work. Hit me up if you need me to test anything out for you on this machine!