After a config change in suricata.yaml:
dpdk: #rx-descriptors: 1024 # 1024
rx-descriptors: 16384 # 1024 #tx-descriptors: 1024 # 1024
tx-descriptors: 16384 # 1024
When at night log rotates passes by to reload suricata after moving logfiles:
[1435078 - US] 2023-08-15 03:15:01 Error: unix-manager: Command server: select() fatal error: Bad file descriptor
[1435078 - US] 2023-08-15 03:15:01 Error: unix-manager: Fatal error on unix socket
And indeed in the morning suricatasc is not repsonding and have to ctrl break it.
This seems related because after setting the values back to 1024, removing the ābadā socket file ā¦
$ rm /var/run/suricata/suricata-command.socket
ā¦ and starting suricata, it goes well at nigh.
Any thoughts or tips please? A tx increase is I guess not needed on spanports only delivering packets?
Red Hat Enterprise Linux release 8.8 (Ootpa)
4.18.0-477.15.1.el8_8.x86_64 #1 SMP Fri Jun 2 08:27:19 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
This is Suricata version 7.0.1-dev (becb8cefc 2023-08-11)
Running production since a week, so a little bit less play ground. Can give it at try, but maybe is it a ābuild upā of some memory or so during hours causing this. Give me a sec
sure, my first thought was having too little RAM might be the problem - but not sure. I will try to reproduce the issue on my side just to check the immediate response.
How much RAM do you have there installed and how many hugepages have you allocated?
Ok looks indeed like Iām putting the bounderies to far. Noticed I only set this for one interface:
rx-descriptors: 16384 # 1024
tx-descriptors: 1024 # 1024
So modified it also for the default interface (3 interfaces depend on it) and now suricata totally fails:
Aug 15 10:08:20 scomp1720 suricata[1643560]: ice_alloc_rx_queue_mbufs(): Failed to allocate mbuf for RX
Aug 15 10:08:20 scomp1720 suricata[1643560]: ice_rx_queue_start(): Failed to allocate RX queue mbuf
Aug 15 10:08:20 scomp1720 suricata[1643560]: ice_dev_start(): fail to start Rx queue 15
Aug 15 10:08:20 scomp1720 suricata[1643560]: Error: dpdk: 0000:84:00.1: error (Input/output error) during device startup [ReceiveDPDKThreadInit:source-dpdk.c:540]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: flow-manager: using 1 flow manager threads [FlowManagerThreadSpawn:flow-manager.c:948]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Perf: threads: Setting prio 0 for thread āFM#01ā, thread id 1643975 [TmThreadSetupOptions:tm-threads.c:881]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: flow-manager: using 1 flow recycler threads [FlowRecyclerThreadSpawn:flow-manager.c:1154]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Perf: threads: Setting prio 0 for thread āFR#01ā, thread id 1643976 [TmThreadSetupOptions:tm-threads.c:881]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Perf: threads: Setting prio 0 for thread āCWā, thread id 1643977 [TmThreadSetupOptions:tm-threads.c:881]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Perf: threads: Setting prio 0 for thread āCSā, thread id 1643978 [TmThreadSetupOptions:tm-threads.c:881]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Info: unix-manager: unix socket ā/var/run/suricata/suricata-command.socketā [UnixNew:unix-manager.c:136]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Perf: threads: Setting prio 0 for thread āUSā, thread id 1643979 [TmThreadSetupOptions:tm-threads.c:881]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Error: threads: thread āW#26-84:00.0ā failed to initialize: flags 0145 [TmThreadWaitOnThreadInit:tm-threads.c:1955]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Error: suricata: Engine initialization failed, abortingā¦ [SuricataMain:suricata.c:2991]
Aug 15 10:08:20 scomp1720 suricata[1643560]: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=15
Aug 15 10:08:20 scomp1720 suricata[1643560]: ixgbe_dev_rx_queue_start(): Could not alloc mbuf for queue:15
Aug 15 10:08:20 scomp1720 suricata[1643560]: ixgbe_dev_start(): Unable to start rxtx queues
Aug 15 10:08:20 scomp1720 suricata[1643560]: ixgbe_dev_start(): failure in ixgbe_dev_start(): -1
Aug 15 10:08:20 scomp1720 suricata[1643560]: ice_alloc_rx_queue_mbufs(): Failed to allocate mbuf for RX
Aug 15 10:08:20 scomp1720 suricata[1643560]: ice_rx_queue_start(): Failed to allocate RX queue mbuf
Aug 15 10:08:20 scomp1720 suricata[1643560]: ice_dev_start(): fail to start Rx queue 15
Aug 15 10:08:20 scomp1720 suricata[1643560]: Error: dpdk: 0000:0f:00.0: error (Input/output error) during device startup [ReceiveDPDKThreadInit:source-dpdk.c:540]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_VLAN_STRIP - available [DumpRXOffloadCapabilities:runmode-dpdk.c:997]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_IPV4_CKSUM - available [DumpRXOffloadCapabilities:runmode-dpdk.c:999]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_UDP_CKSUM - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1001]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_TCP_CKSUM - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1003]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_TCP_LRO - NOT available [DumpRXOffloadCapabilities:runmode-dpdk.c:1005]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_QINQ_STRIP - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1007]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1009]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_MACSEC_STRIP - NOT available [DumpRXOffloadCapabilities:runmode-dpdk.c:1011]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_VLAN_FILTER - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1017]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_VLAN_EXTEND - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1019]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_SCATTER - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1021]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_TIMESTAMP - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1023]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_SECURITY - NOT available [DumpRXOffloadCapabilities:runmode-dpdk.c:1025]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_KEEP_CRC - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1027]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_SCTP_CKSUM - NOT available [DumpRXOffloadCapabilities:runmode-dpdk.c:1029]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_OUTER_UDP_CKSUM - NOT available [DumpRXOffloadCapabilities:runmode-dpdk.c:1031]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_RSS_HASH - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1033]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT - available [DumpRXOffloadCapabilities:runmode-dpdk.c:1036]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: 0000:84:00.1: RSS enabled for 16 queues [DeviceInitPortConf:runmode-dpdk.c:1112]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: 0000:84:00.1: IP, TCP and UDP checksum validation offloaded [DeviceInitPortConf:runmode-dpdk.c:1152]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: 0000:84:00.1: setting MTU to 1518 [DeviceConfigure:runmode-dpdk.c:1478]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: 0000:84:00.1: creating packet mbuf pool mempool_0000:84:00.1 of size 65535, cache size 511, mbuf size 2176 [DeviceConfigureQueues:runmode-dpdk.c:1183]
Aug 15 10:08:20 scomp1720 suricata[1643560]: Config: dpdk: 0000:84:00.1: rx queue setup: queue:0 port:4 rx_desc:4096 tx_desc:1024 rx: hthr
you mean that you restart Suricata by stopping it and starting it again, right?
At first, I thought by āreloading Suricataā you mean reload-rules via suricatasc.
I can reproduce the issue now by running and stopping Suricata when running with higher count of Suricata workers (e.g. I am running 18+18 cores in IPS mode with Mellanox Connect-X5 card and with mempool-size set to 1048575 elements, number of RX/TX descriptors donāt seem to play a role in reproducing the error log).
But RX/TX descriptors seem to play a role when running suricatasc in parallel to Suricata.
In my case if I have :
1024 descriptors:
and I run suricatasc then the shutdown goes well
otherwise if I donāt run suricatasc then the error appears
32768 descriptors:
error appears irrespective to suricatasc
I am only curious about the other thing you wrote:
suricatasc is not repsonding and have to ctrl break it
At least in my case, the shutdown of Suricata doesnāt get stuck. If you used any - what command did you use for suricatasc? Or can you be a bit more specific about your suricatasc/log rotate configs/commands?
This needs more attention, Iām creating a Redmine ticket and Iāll try to fix it soon.
I think it is safer to set RX/TX descriptors to the same value as some DPDK internals might rely on having the same number of descriptors. I can see your point but Iāve been burned on this at least once so now I usually just set it to the same number - possibly a good idea to only have 1 setting for descriptors