Dear developers,
I am trying to make a core scaling about Suricata 6.0.10 with Hyperscan using some pcap files. I tested the total seconds used with different cores and threads. I ran this command with specific configuration file:
./bench_install_root/usr/bin/suricata -c suricata_bench/suricata.yaml -r ./pcap_files -l ./log_std/log_hs_hs
I got outputs like this with 8C16T:
5/5/2023 -- 15:26:15 - <Notice> - This is Suricata version 6.0.10 RELEASE running in USER mode
5/5/2023 -- 15:26:16 - <Error> - [ERRCODE: SC_WARN_JA3_DISABLED(309)] - ja3 support is not enabled
5/5/2023 -- 15:26:16 - <Error> - [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature "alert tls $HOME_NET any -> $EXTERNAL_NET any (msg:"ET JA3 Hash - Suspected Cobalt Strike Malleable C2 M1 (set)"; flow:established,to_server; ja3.hash; content:"eb88d0b3e1961a0562f006e5ce2a0b87"; ja3.string; content:"771,49192-49191-49172-49171"; flowbits:set,ET.cobaltstrike.ja3; flowbits:noalert; classtype:command-and-control; sid:2028831; rev:1; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, created_at 2019_10_15, deployment Perimeter, former_category JA3, malware_family Cobalt_Strike, signature_severity Major, updated_at 2019_10_15, mitre_tactic_id TA0011, mitre_tactic_name Command_And_Control, mitre_technique_id T1001, mitre_technique_name Data_Obfuscation;)" from file /home/xuhao/suricata_bench/bench_install_root/var/lib/suricata/rules/emerging-all.rules at line 27115
5/5/2023 -- 15:26:19 - <Error> - [ERRCODE: SC_WARN_JA3_DISABLED(309)] - ja3(s) support is not enabled
5/5/2023 -- 15:26:19 - <Error> - [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - error parsing signature "alert tls $EXTERNAL_NET any -> $HOME_NET any (msg:"ET JA3 HASH - Possible RustyBuer Server Response"; flowbits:isset,ET.rustybuer; ja3s.hash; content:"f6dfdd25d1522e4e1c7cd09bd37ce619"; reference:md5,ea98a9d6ca6f5b2a0820303a1d327593; classtype:bad-unknown; sid:2032960; rev:1; metadata:attack_target Client_Endpoint, created_at 2021_05_13, deployment Perimeter, former_category JA3, malware_family RustyBuer, performance_impact Low, signature_severity Major, updated_at 2021_05_13;)" from file /home/xuhao/suricata_bench/bench_install_root/var/lib/suricata/rules/emerging-all.rules at line 60175
5/5/2023 -- 15:26:28 - <Notice> - all 17 packet processing threads, 4 management threads initialized, engine started.
5/5/2023 -- 15:26:51 - <Notice> - Signal Received. Stopping engine.
5/5/2023 -- 15:26:52 - <Notice> - Pcap-file module read 3 files, 8383530 packets, 4178235146 bytes
I used the time difference between line 8 and line 1 as the performance metric of this test. And I got time reduction from 1C1T configuration to 4C4T. However, when I allocate more cores and threads, the performance does not improve anymore.
I configured 4C4T in suricata.yaml
:
threading:
set-cpu-affinity: yes
# Tune cpu affinity of threads. Each family of threads can be bound
# to specific CPUs.
#
# These 2 apply to the all runmodes:
# management-cpu-set is used for flow timeout handling, counters
# worker-cpu-set is used for 'worker' threads
#
# Additionally, for autofp these apply:
# receive-cpu-set is used for capture threads
# verdict-cpu-set is used for IPS verdict threads
#
cpu-affinity:
- management-cpu-set:
cpu: ["44-55", "100-111"]
- receive-cpu-set:
cpu: ["36-43", "92-99"]
- worker-cpu-set:
cpu: ["28-31"]
mode: "exclusive"
# Use explicitly 3 threads and don't compute number by using
# detect-thread-ratio variable:
threads: 4
prio:
low: [ 0 ]
medium: [ "1-2" ]
high: ["28-35", "84-91"]
default: "high"
#- verdict-cpu-set:
# cpu: [ 0 ]
# prio:
# default: "high"
8C8T:
threading:
set-cpu-affinity: yes
# Tune cpu affinity of threads. Each family of threads can be bound
# to specific CPUs.
#
# These 2 apply to the all runmodes:
# management-cpu-set is used for flow timeout handling, counters
# worker-cpu-set is used for 'worker' threads
#
# Additionally, for autofp these apply:
# receive-cpu-set is used for capture threads
# verdict-cpu-set is used for IPS verdict threads
#
cpu-affinity:
- management-cpu-set:
cpu: ["44-55", "100-111"]
- receive-cpu-set:
cpu: ["36-43", "92-99"]
- worker-cpu-set:
cpu: ["28-35"]
mode: "exclusive"
# Use explicitly 3 threads and don't compute number by using
# detect-thread-ratio variable:
threads: 8
prio:
low: [ 0 ]
medium: [ "1-2" ]
high: ["28-35", "84-91"]
default: "high"
#- verdict-cpu-set:
# cpu: [ 0 ]
# prio:
# default: "high"
The result showed that it cost 31s with 4C4T and 30s in 8C8T. When I use more cores and threads, I get no performance increase.
Did I do something wrong to test core scaling? If so, what should I do to correctly test?
I also have another question. I configured 4C4T, but the output showed there are 5 packet processing threads.
5/5/2023 -- 16:20:38 - <Notice> - all 5 packet processing threads, 4 management threads initialized, engine started.
The output shows that there is always one more packet processing thread than I configured, can you tell me why?
I would be very grateful if you could help me. Thanks!
suricata.yaml (73.3 KB)