Victor you may on to something. please see https://redmine.openinfosecfoundation.org/issues/4096 for the reply on 15/12/2020, changing usleep values have a significant impact on the issue
Since the host is KVM and issue is related to usleep(), can we check if tweaking the halt_poll_ns kvm module parameter in the host solves the problem
Check current default value:
root@virtual-machine:~$ cat /sys/module/kvm/parameters/halt_poll_ns
200000
Temporary change for testing
echo 10000 > /sys/module/kvm/parameters/halt_poll_ns
Permanent changes by modifications to kvm.conf
[root@localhost ~]# cat /etc/modprobe.d/kvm.conf
# Setting modprobe kvm_intel/kvm_amd nested = 1
# only enables Nested Virtualization until the next reboot or
# module reload. Uncomment the option applicable
# to your system below to enable the feature permanently.
#
# User changes in this file are preserved across upgrades.
#
# For Intel
#options kvm_intel nested=1
#
# For AMD
#options kvm_amd nested=1
options kvm halt_poll_ns=10000
References:
Bug #1724614 “[KVM] Lower the default for halt_poll_ns to 200000…” : Zesty (17.04) : Bugs : linux package : Ubuntu (launchpad.net)
The KVM halt polling system — The Linux Kernel documentation
KVM settings - IBM Documentation
Hi, I set it to 10000,and even went down to 0, but no noticable change.
This is on a proxmox host (kvm) with OPNsense VM (suricata 6.0.3)
Comparing suricata 5.0.7 to 6.0.3, the cpu load almost doubled. But those measurements obviously fluctuate.
While monitoring (zabbix via snmp) this VM I found a rather steady value of context switches.
5.0.7: ~800 sw/s
6.0.3: ~12000sw/s
That’s factor 15.
To be sure I checked in the vm with top and saw indeed VCSW: ~12000
@janssensm
Not sure if this is going to be helpful. Can you please try the tweaks (clock policy) suggested at https://www.reddit.com/r/VFIO/comments/80p1q7/high_kvmqemu_cpu_utilization_when_windows_10/
@srini38
Proxmox uses qemu/kvm, but doesn´t use libvirt. Also this setting was for fix for a windows system, not linux/freebsd.
Nevertheless I’m curious if your advice has any effect.
In your mentioned reddit post I saw a setting for HPET, which seems to be the main change.
Proxmox by default boots a vm with hpet enabled, also listed in dmesg inside the vm. So just to see if disabling it would have any effect, I booted it with arguments -no-hpet.
This made total cpu usage almost the same as without it, but now it uses more cpu system time, less cpu user time.
The reason why I mentioned the context switching statistics is because of the large increase. And normally extra context switching shows higher cpu load. The former message was measured from my production vm. In the mean time I set up a test vm whith no load (no network clients) and there the difference is even greater.
5.0.7: ~400 sw/s
6.0.3: ~12000sw/s
That’s factor 30.
Also when I look at the suricata threads in top:
PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND
55082 root 5734 2 0 0 0 0 0.00% suricata{FM#01}
55082 root 5775 1 0 0 0 0 0.00% suricata{FR#01}
55082 root 184 0 0 0 0 0 0.00% suricata{suricata}
55082 root 21 0 0 0 0 0 0.00% suricata{W#01-vtnet2^}
55082 root 20 0 0 0 0 0 0.00% suricata{W#01-vtnet2}
55082 root 19 0 0 0 0 0 0.00% suricata{W#01-vtnet1^}
55082 root 19 0 0 0 0 0 0.00% suricata{W#01-vtnet1}
55082 root 1 0 0 0 0 0 0.00% suricata{CW}
55082 root 0 0 0 0 0 0 0.00% suricata{CS}
That looks like a direct consequense of using usleep.
Quote from Bug #4421: flow manager: using too much CPU during idle (6.0.x backport) - Suricata - Open Information Security Foundation
the new
usleep
based loops in the flow manager and recycler consume too much CPU time. In busy systems this has been show to be more efficient than the old pthread_condtimed logic, but in lower end systems the usleep approach has too much overhead.
Perhaps @vjulien could chime in and give some directions what to test and review these observations.
Instead of the opnsense vm I set up an ubuntu 20.04.3 vm with install from ppa to make monitoring and troubleshooting easier. Same host, same specs for vm.
Comparable results, as others earlier mentioned in this thread.
Cpu usage inside the vm is not that significant with suricata 6.0.3. Host cpu usage increase for this vm is significant.
To see if context switching is a possible cause for affecting kvm so much:
I shut down suricata in this vm and tried to simulate a similar rate of context switching with stress-ng:
stress-ng --cyclic 1 --cyclic-method usleep
I can see comparable results on the host.
So it seems that somehow the usleep loop is causing excessive context switching which causes extra cpu load on the host when running KVM. The real impact depends on the host hardware, but I already saw several posts on the proxmox forum (many users run opnsense with suricata in a vm) strugling if they should disable suricata.
Would it be possible to use pthread for virtual machines and the usleep loop for physical machines?
That would also avoid the struggle from this tracking bug Bug #4379: flow manager: using too much CPU during idle - Suricata - Open Information Security Foundation
:
Perhaps we can make the usleep value configurable. On lower end systems there is no need to wake up frequently. Or optionally bring back the old behavior. Overall I’m not a great fan these kinds of options.
Same issue on Proxmox KVM with pfSense. See: https://forum.netgate.com/topic/166657/suricata-increase-in-cpu-use-after-upgrade-to-v6
I’m currently looking at this code, mostly for other reasons, to adjust timings. Is there a minimum usleep
value that works well in all these scenarios?
I think the value of v5 was correct. I did not have any issues with v5.
v5 uses a very different logic, w/o usleep
. It has other issues, which is why it was updated.
I really do not know if a lower usleep will help. Would be nice if we can test that somehow.
As @janssensm stated before; is it possible to let Suricata use pthread -or- usleep?
the question I’m trying to get an answer for is: has anyone experimented with a higher usleep value? I don’t know the usleep behavior in these scenarios with vms, low end hw, etc.
First, thank you for still looking into this.
That was a thought that was already on my to do list and one of the reasons to set up my test vm with ubuntu instead of opnsense.
This will probably take some time to test different settings.
Am I right that the only way to test different usleep values is to compile from source with changed values as in this commit?
Indeed, this exact commit shows how: Next/60x/20210215/v2 by victorjulien · Pull Request #5860 · OISF/suricata · GitHub
Thanks for confirming. I will see what I can test.
libvirt only helps to pass on the values to QEMU. The behavior of QEMU/KVM varies from version to version. Only way to understand why too many calls to usleep() in guest causes shoot up of host CPU would be to profile the host kernel. Once the root cause is identified, there most likely would be a setting in QEMU to affect the behavior.
As already mentioned on note #33 Bug #4096: flow manager: 200% CPU in KVM host with no activity with Suricata 6 - Suricata - Open Information Security Foundation
Bare metal too.
Interesting so this is happening even on bare metal. In that case can Suricata have an adaptive usleep() logic, rather than static usleep() values?
Hi @vjulien, I just added a comment in Bug #4421: flow manager: using too much CPU during idle (6.0.x backport) - Suricata - Open Information Security Foundation with test results. Sorry for not testing earlier, but now we can also see the test results for the backported change for pthread.