IPFW woes and workarounds

Hija,

I am running FreeBSD (12.2 at the moment), and I figured that
suricata can be plugged into IPFW via divert, and then runs as a
packet filter just like the other filters plugged into IPFW
(forwarders, blacklisters, NATs. trafficshapers, etc.) - You would
call it “IPS” mode.

Now I noticed this seems not the most popular way of running
suricata - there is not much to be found when searching the web.
Anyway, if something fancy can be done with IPFW, I love to do it;
on the other hand I have no idea what people “normally” do with
suricata (and I’m absolutely not into the high-profile paranoia
cyber-security thing; I prefer decent engineering work). :wink:

Bottomline is, I ran into a couple of problems, for some I found
solutions, for some I am still looking for solutions. So, maybe
it’s a good idea to share. Both of them. :wink:

ITEM 1.

Version 6.0.2 does not run at all. 5.0.6 does run.

It stops transporting traffic after 2-5 minutes and then just sits
there like a dead fish. The friendly maintainer pointed me to this
issue Bug #4478: freebsd: lockups due to mutex handling issues - Suricata - Open Information Security Foundation ,
and that describes precisely what I am observing - only that I
am not (consciousely) using netmap. I am using IPFW mode.

So, I think, I’ll wait until -hopefully- that issue finds a fix,
and then, maybe -hopefully- my issue might also be gone. And if
not, then it will still be soon enough to look deeper into this.

ITEM 2.

autofp runmode does not work with IPFW.

The executable appears to advertise this:

| IPFW  | autofp  | Multi threaded IPFW IPS mode with respect to flow 
| ---------------------------------------------------------------------
|       | workers | Multi queue IPFW IPS mode with one thread per queue

But in practice, really, no. See here, some IPFW rules might be
this:

$ ipfw show 1860-1870
01864    26    5200 count log ip from any to any 9103
01865 13122 3403131 divert 8677 ip from any to any
01866    21    2030 count log ip from any to any 9103

$ ipfw show 2870-2884
02874     0       0 count log ip from any to any out via lo0
02875 17105 3615040 divert 8677 ip from any to any
02884    63   47588 count log ip from any to any out via lo0

And then suricata runs like this:
$ /usr/local/bin/suricata -D --runmode autofp -d 8677

And now what happens to our packets:

ipfw: 1 Count TCP 203.0.113.12:11363 198.51.100.224:9103 in via igb0
ipfw: 1864 Count TCP 203.0.113.12:11363 198.51.100.224:9103 in via igb0
ipfw: 2884 Count TCP 203.0.113.12:11363 198.51.100.224:9103 out via lo0
ipfw: 2894 Count TCP 203.0.113.12:11363 198.51.100.224:9103 out via lo0
ipfw: 2964 Count TCP 203.0.113.12:11363 198.51.100.224:9103 out via lo0
ipfw: 3114 Count TCP 203.0.113.12:11363 198.51.100.224:9103 out via lo0
ipfw: 3115 Unreach 13 TCP 203.0.113.12:11363 198.51.100.224:9103 out via lo0

They leave IPFW for suricata at the first divert rule (1865), but
come back in behind the second (2875)! And come in with bogus metadata.
And this happens in an unpredictable fashion. (Given how the thread
model of surcata looks in autofp mode, this is understandable.) This
obviousely blows the whole logic of these IPFW rules. And as result my
application did complain about un-decryptable TLS; pieces of the
data were probably missing.
So yes, it may work somehow, at first, or for a while. And then
creepy application errors may appear, and that might be more or less
laboursome to pinpoint.
So be warned: autofp runmode does not work with IPFW.

I tried this because in workers runmode I had one suricata thread
nicely locked at 100.0% CPU - and this appeared to look better with
autofp. Finally I found another posting here stating that IPFW does
indeed support only one thread. That’s a bit sad. (Couldn’t this
be somehow hacked, like, feed suricata from 2+ divert sockets and
do the load levelling in IPFW?)

ITEM 3.

suricata crashes all the time.

This is by design. When IPFW gets a packet back after suricata has
looked into it, and then IPFW decides to reject that packet (for
some other reason), then suricata appears as the sender of the
packet, and therefore receives an EACCES error from the kernel.
And that makes suricata crash.

The same happens when IPFW decides to send the packet onto a full
queue (“no buffer space”). Then suricata receives the ENOBUFS - and
crashes.

This can be fixed with some patch like this:

--- src/source-ipfw.c.orig      2021-03-01 22:46:04.000000000 +0100
+++ src/source-ipfw.c   2021-05-31 16:55:58.090067000 +0200
@@ -577,8 +577,11 @@
                     SCLogWarning(SC_WARN_IPFW_XMIT,"Write to ipfw divert socket failed: %s",strerror(r));
                     IPFWMutexUnlock(nq);
                     SCReturnInt(TM_ECODE_FAILED);
+                case ENOBUFS:
+                    SCLogWarning(SC_WARN_IPFW_XMIT,"Write to ipfw divert socket failed: %s",strerror(r));
                 case EHOSTDOWN:
                 case ENETDOWN:
+                case EACCES:
                     break;
             }
         }

ITEM 4.

lots of invalid header checksum alerts

These alerts come from rule 1:2200073, but when IPFW puts a packet
onto a divert socket, it might have an invalid checksum by design. This
has to do with the hopcount: at some point the hopcount is already
decremented, but the checksum not yet recomputed (probably because
that might be offloaded, depending on where the packet will finally
be delivered - and we don’t know that yet).

The easiest solution was to put the 1:2200073 into disable.conf.

ITEM 5.

Millions of invalid acks.

This has just happened today. I suddenly noticed the log had grown
to 10 GB within an hour (and the machine was stuck compressing that),
all from these:

[1:2210020:2] SURICATA STREAM ESTABLISHED packet out of window
[1:2210029:2] SURICATA STREAM ESTABLISHED invalid ack
[1:2210045:2] SURICATA STREAM Packet with invalid ack

They come from TLS bulk transfer streams, and I have currently no idea
why. The tcpdump looks sane at first glance, and the applications work
fine.

For now these also go into disable.conf.

IPFW support is essentially unmaintained. We should probably mark it as such or schedule it for removal altogether. This is the first report we’ve had in ages, which given the severity of the report probably means nobody is using it. Can you open tickets for the first 3 items? We’ll assign them to “community” so they may not get fixes, but if anyone is interested in working on this our ticket tracker is more likely to be the place ppl look.

Thanks!