If I disable HyperThreading I’m able to have suricata threads honoring cpu affinity for all 4 network interfaces (3 on numa node 0 en 1 on node 1) but only able to have 8 threads per cpu. If I enable HT,
19 threads per nic is possible, but cpu affinity gets a mess: 22 threads on wrong numa node.
See the challenge :
HT disabled
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
HT enabled:
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127
What is performance wise the prefered option? More threads with a messy cpu affinity, or less threads honoring cpu affinity?
The best is to have more threads while honoring the thread affinity
I can see the problem you are having - while it gets better on Suricata side I’ve thought it might be possible to run two Suricata processes, one on each NUMA node.
So you run Suricata.proc1 on NUMA0 with 3 interfaces and Suricata.proc2 on NUMA1 with the other interface. To support this deployment scenario might be moving one of the interfaces of NUMA0 to NUMA1 to better balance the load - although that also depends on what traffic you expect on individual interfaces.
Otherwise, I would consider having more threads is better than NUMA node location. However, please note that adding a hyperthreaded core (where the other core runs Suricata already) is not equal to a free core albeit on a separate NUMA node that doesn’t run Suricata.
Hyperthreaded core only uses the fact that the “main” core must sometime wait for e.g. memory operations.
For your usecase I would assume running Suricata on all NUMA nodes and all cores yields better results considering you use more cores than you would use if you lock those 3 interfaces to NUMA node0.
Haha indeed my idea
We switched this morning one interface from0 to numa node 1, so now it’s better distributed. Before correcting the affinity, is it better to have HT disabled?
Cheers
It also really depends on the volume of traffic you mirror and the lscpu output - which in turns depends on Intel/AMD CPU architectures etc. The one you posted above looks good.
One thing to watch out for (at least in my observations) is also the memory/Ram NUMA location/pinning if it is done specifically for a process.
In a lot of cases you can run Suri with multi NIC/ports residing on diff NUMAs however as soon as it reaches the total configured memory usage for Suricata (aka lots of traffic per port - 20g+) - to the mem limits of the allocated ram of one NUMA it becomes non optimal and you need to split the config deployment.
Wondering if you can share some details about your setup , like CPU /NIC etc ?
Do you see a measurable difference while using or not HTs ?
Yep 24h so I was suprised too due to the fact that Thurdays are bussier than Wednesdays related to networking statistics. Can’t find a explenation at the moment.
Today, this afternoon, started with running suricata 7 on this box with HT disabled in BIOS. So had to do a recalc regarding affinity, threads and nic position in suricata.yaml, but it is running now. First observation: nic bound cores wiggles sometimes between 100% and 99,% , can’t recall this happened while HT was enabled. Next: management bound cores are busy, logical because running on 6 cores instead of previous 12.