Runnning DPDK 24.07.0, Suricata 8 main repo (8.0.0-dev (dd71ef0af 2024-11-05), Redhat 8 4.18.0-553.22.1.el8_10.x86_64 , for some weeks now and after a restart of suricata most of the time one or more 100G spanports go inactive, no link seen.
Today this worked again to get the spanport active:
$ systemctl stop suricata
$ /usr/local/bin/dpdk-devbind.py -b ice 0000:84:00.1
$ /usr/local/bin/dpdk-devbind.py --bind=vfio-pci 0000:84:00.1
$ systemctl start suricata
What todo? Going back to DPDK 24.03.0 which had no issues, or do you have some ideas/tips to debug?
we can work on the bug report and then we can reach out to Intel through DPDK “users” mailing list.
I guess I would need more info, so how do you restart Suricata? Is it rule reload or stopping it and then starting again?
I guess the spanport information is not crucial here as it is a configuration outside of the current ice port, am I right?
Does it happen right after Suricata startup or usually Suricata needs to run for a few hours/days and then it happens? If you start and stop Suri immediately can you still loose the link?
Especially if it can be reproduced immediately, can you test it with other DPDK apps as well, e.g. dpdk-testpmd?
I’ve noticed that ice seems to have a delayed startup - the first second or two - Suricata say it started the workers and everything but the card really only starts to receive packets after those 2 seconds. This can also be observed with the aforementioned testpmd app which shows something like Link status changed. So maybe the startup lag is a bit longer in this version? Did you give it enough time to start before killing it?
So I assume you monitor multiple ice ports and they randomly go inactive. Is it correct or have you seen particularly one port being more prone to this?
Rule reload has no affect, it is indeed service restart or stop start via systemctl
Yes you are right
I need to test that one, it restarts every night due to log rotation and then an interface usesually dissapears
I need to test that one
You mean suricata? I should check the logs, sometimes it takes more time because of closing pcap logging. But nothing changed except the dpdk version and running the suricata daily
Last week three ports gone, yesterday and today 1 (the same both times)