Suricata 6.0.8 hangs on startup

We are using Suricata as part of a solution we are building for our customers. We regularly run automated tests with Suricata running inside VMs created using vagrant. We recently upgraded to Suricata 6.0.8 and have started seeing Suricata occasionally hang on startup of the application. We are feeding a PCAP file into Suricata via command line and have it loaded with the emerging rule set. Today, I’ve modified the test to create the VM once and then repeatedly try to run Suricata identically ten times in a row. On the 4th run of Suricata I got the hang up.
Suricata_bad_boot.log (49.7 KB)
Suricata_good_boot.log (50.9 KB)

This is running on Red Hat 7 Enterprise Linux.

Attached are two log files, with the bad one showing where Suricata is hanging. Please let me know if you have any ideas on how to chase this further.

Can you paste suricata --build-info and also the config file used?
Ideally rerun the test with 6.0.10 just to make sure the issue is there as well.

You could also run perf top -$(pidof suricata) to check if there is some function producing a lot overhead.

I’ve repeated the test with Suricata 6.0.10 and it fails in the same way. I’ve built 6.0.10 from source and have added enough debug that it looks like the startup thread that spawns various threads and then waits for INIT_DONE to happen ( in tm-threads.c in TmThreadSpawn() ) gets stuck because it never comes back from a call to usleep() that happens in TmThreadWaitForFlag(). It varies which thread it gets stuck on spawning, so I don’t think it is thread specific. This is the info on what I’m using for my linux kernel. I previously had been running Suricata 4.x from the Red Hat 7 repo on this same linux box with no issues.

uname -a

Linux ids7 3.10.0-1160.el7.x86_64 #1 SMP Tue Aug 18 14:50:17 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

Below is the Suricata build info:

sudo suricata --build-info

  • sudo suricata --build-info
    This is Suricata version 6.0.10 RELEASE
    Features: PCAP_SET_BUFF AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HAVE_LIBJANSSON TLS TLS_GNU MAGIC RUST
    SIMD support: none
    Atomic intrinsics: 1 2 4 8 byte(s)
    64-bits, Little-endian architecture
    GCC version 4.8.5 20150623 (Red Hat 4.8.5-44), C version 199901
    compiled with _FORTIFY_SOURCE=0
    L1 cache line size (CLS)=64
    thread local storage method: __thread
    compiled with LibHTP v0.5.42, linked against LibHTP v0.5.42

Suricata Configuration:
AF_PACKET support: yes
eBPF support: no
XDP support: no
PF_RING support: no
NFQueue support: no
NFLOG support: no
IPFW support: no
Netmap support: no using new api: no
DAG enabled: no
Napatech enabled: no
WinDivert enabled: no

Unix socket enabled: yes
Detection enabled: yes

Libmagic support: yes
libnss support: yes
libnspr support: yes
libjansson support: yes
hiredis support: no
hiredis async with libevent: no
Prelude support: no
PCRE jit: yes
LUA support: no
libluajit: no
GeoIP2 support: no
Non-bundled htp: no
Hyperscan support: no
Libnet support: no
liblz4 support: yes
HTTP2 decompression: no

Rust support: yes
Rust strict mode: no
Rust compiler path: /root/.cargo/bin/rustc
Rust compiler version: rustc 1.67.1 (d5a82bbd2 2023-02-07)
Cargo path: /root/.cargo/bin/cargo
Cargo version: cargo 1.67.1 (8ecd4f20a 2023-01-10)
Cargo vendor: yes

Python support: yes
Python path: /usr/bin/python2.7
Install suricatactl: yes
Install suricatasc: yes
Install suricata-update: yes

Profiling enabled: no
Profiling locks enabled: no

Plugin support (experimental): yes

Development settings:
Coccinelle / spatch: no
Unit tests enabled: no
Debug output enabled: no
Debug validation enabled: no

Generic build parameters:
Installation prefix: /usr
Configuration directory: /etc/suricata/
Log directory: /var/log/suricata/

–prefix /usr
–sysconfdir /etc
–localstatedir /var
–datarootdir /usr/share

Host: x86_64-unknown-linux-gnu
Compiler: gcc (exec name) / g++ (real)
GCC Protect enabled: no
GCC march native enabled: no
GCC Profile enabled: no
Position Independent Executable enabled: no
CFLAGS -g -O2 -std=gnu99 -I${srcdir}/…/rust/gen -I${srcdir}/…/rust/dist
PCAP_CFLAGS
SECCFLAGS

Can you also post the config file to reproduce that?

What specs does the system have?

And 3.10 is a very old kernel, EOL in 2017

We are using Vagrant with VirtualBox 6.1 to create the VM running RedHat. The actual box is allocated 8 GB of RAM and 2 CPU cores.

Here’s the config file we are using:
suricata_tests_config.yaml (2.9 KB)

Here’s the pcap file we are using:
alert-testmyids-async.pcap (583 Bytes)

We are using this version of Red Hat because of a customer requirement. We realize it is an old kernel. I’m just suprised how reproducible this hang up is.

Our automated tests start the VM and try to run Suricata manually on the bash shell 10 times using the config and pcap above. On most runs, I can get it to hang up by the 4th or 5th run, but sometimes sooner.

I can share my debug output and my modified tm-threads.c if you would like to confirm the failure I am seeing.

Since you mention a pcap, how does your run command looks like? I would like to see if I can reproduce it on a system as well.

If you have the chance to test a more recent kernel that would be also worth a try, just to rule out it’s related to that.

Here is the command line that I am using:

sudo suricata -vvvv -l /var/log/suricata --runmode single -c suricata_tests_config.yaml -r alert-testmyids-async.pcap

I added some debug into the Suricata code mainly in tm-threads.c in order to arrive at my conclusion. I’ve attached the output from the latest run with debug. I logged all my added messages as invalid argument errors just to make sure they would come out to the console.

Here’s the log output:
Suricata_boot_bad_debug.log (59.1 KB)

Here’s the modified code file:
tm-threads.c (72.1 KB)

I’m working this morning on getting the generic RHEL8 box from Vagrant Cloud running in my setup.

I wasn’t able to reproduce it on ArchLinux, Ubuntu 22.04 LTS and Debian Bullseye 11 VMs setup with 8GB of memory and 2 vCPUs.

What vagrant version are you using and with which virtualization layer?

Also the config is missing quite some relevant parts, as you can see in the ERROR messages.

So ideally you can find a reproducible case with one of the more open/modern distributions which would make it easier for us to test.

We are using vagrant 2.2.19 and Virtualbox 6.1.42 version.

I had issues trying to get an online RHEL 8 vagrant box running in my setup, so I will have to build one from scratch.

Currently, I have modified my setup to upgrade to vagrant 2.3.4 to see if I can still reproduce the issue. Assuming it still happens, I probably will upgrade VirtualBox to the latest 7.x release and retest.

Yes, I know the config is not complete and will be changing that. However, I wanted to minimize changes while I had this repeatable startup failure.

Andreas,
I wanted to give you an update on this issue. After a lot of digging and testing, I discovered that the CPU was getting stalled within the VM that I was running with Virtualbox. Further research led me to this article in the virtualbox forum ( virtualbox.org • View topic - HMR3Init: Attempting fall back to NEM (Hyper-V is active) ). It turns out, I thought I had Hyper-V disabled on this Windows host machine, but I did not. So, I had to disable secure boot, disable Hyper-V via the instructions, then enabled secure boot on the machine. Once I did this, so far I have not had a CPU stall nor a Suricata hang in my testing so far ( about 120 restarts of Suricata ). I will continue testing and let you know if anything else pops up, but I believe that is the fix. Sorry for pointing the blame toward Suricata at first.

Glad to know it’s more an underlying issue. Keep us posted and feel free to flag your post as a possible solution if someone runs into this as well.