AF_PACKET mode – Guidance requested on performance-oriented Suricata tuning for single-server automated response (NFTBan)

itcmsgr · December 28, 2025, 9:28pm

Hello Suricata community,

I am working on NFTBan ( GitHub - itcmsgr/nftban: NFTBan is an enterprise-grade firewall management system built on Linux nftables — combining atomic rule updates, privilege separation through Polkit, and AI-assisted threat intelligence for a resilient, self-healing network defense layer. ), an open-source project that integrates Suricata as a detection engine for server hardening on individual VPS/cloud servers.

This is not a typical NIDS deployment. We are optimizing for:

Single-server protection (not network monitoring)
Fast automated response (temporary IP bans via nftables)
Low resource footprint (VPS constraints: 2–4 cores, 2–8 GB RAM)
Focused threat detection: SSH brute-force, web attacks, port scanning, botnet activity

Suricata is used strictly for detection; enforcement is handled externally by our daemon.

Current Setup

Suricata version: 7.0.x
Ruleset: Emerging Threats Open
Capture: AF_PACKET (mmap, tpacket_v3, cluster_flow)
Output: eve.json (file-based)
Processing: External daemon (Go) consuming events and applying nftables rules

Observed performance (production testing on 4-core VPS):

CPU usage: ~15% average
Memory: ~800 MB resident
Alert → block latency: ~300–500 ms (from Suricata alert timestamp to nftables rule applied)
Packet drops: <0.1%

Performance Tuning Applied

To reduce CPU and memory usage while maintaining detection for our target threats, we applied aggressive tuning:

Reduced TCP flow timeouts
- Established: ~60s (down from default 600s)
- Emergency-closed: ~10s
Disabled HTTP body inspection
- Headers and URI only
Disabled unused protocol parsers
- SMB, DCERPC, SMTP
- DNS limited to basic query/response analysis
AF_PACKET ring tuning
- ~16k frames, ~2k frame size (effective packet size)
Disabled NIC offloads
- GRO, LRO, TSO, GSO

Results: ~40% CPU reduction, ~30% memory reduction, while still detecting our target threats in testing.

Questions for Guidance

Flow Timeouts

Are ~60s established TCP timeouts generally safe for SSH and HTTP analysis in our use case?
Do common ET rules for SSH brute-force or web attacks rely on longer flow tracking?

HTTP Body Inspection

With body inspection disabled (URI/headers only):

Are there critical ET rules for web scanners (SQLi/XSS) or botnet detection that fundamentally require body inspection?
What important blind spots should we expect?

We accept reduced coverage for file-based web attacks, but want to understand what we’re missing.

AF_PACKET: cluster_flow Choice

We chose cluster_flow (instead of cluster_cpu or autofp) for the following reasons:

Flow integrity: Both directions of a connection processed by the same worker thread
State coherence: Critical for stateful attack detection (SSH brute-force, multi-stage web attacks)
Reduced atomic overhead: Minimizes internal locking on low-core systems
Predictable latency: Flow pinning ensures linear, predictable alert generation

Question: Is cluster_flow still the right choice for 2–4 core systems in this scenario, or is there a better option for low-latency response?

Output Method

For sub-second response times, is file-based eve.json consumption reasonable, or does Unix socket output provide significant performance gains at this scale?

Our external daemon currently tails eve.json with inotify. If socket-based output would meaningfully reduce the 300–500ms latency, we’ll switch.

What I’m NOT Asking For

Full configuration review (we have a working setup)
General NIDS best practices (different use case)
Broad tuning advice beyond the specific questions above

What I AM Asking For

Validation that these tuning choices don’t introduce critical blind spots for SSH/web attack detection
Best-practice guidance on the specific points above
“You’re missing X” feedback if there’s an obvious gap

Note:
We understand this tuning is aggressive compared to typical NIDS deployments.

Our goal is server hardening with automated response, not comprehensive network analysis.

We’re seeking guidance on whether these choices are appropriate for this focused use case.

Thank you for any insights!

Antonios Voulvoulis

vjulien · January 5, 2026, 11:34am

I think most scan rules do not rely on the flow, rather they track by src or dst.

I think the main thing to consider here is POST bodies not getting inspected. If that is an acceptable risk then there are no drawbacks I can think of.

I think cluster_flow is fine. The other methods are more sensitive to misconfiguration.

I would not expect a difference between these output methods.

If latency is important, an inline method is more immediate in much of it’s detection. In IDS mode we wait for TCP ACK before doing most of the detection. You could experiment with --set stream.inline to put the stream engine in the “inline” mode w/o having to run fully inline.

itcmsgr · January 7, 2026, 6:58pm

Thanks for the detailed feedback — much appreciated.

Your points align well with what we’re seeing in practice:

Flow timeouts: We’ll keep ~60s established as the default. As you noted, most brute-force and scan detections are driven by source/destination tracking and thresholds rather than long-lived stream state, and we’re comfortable with the low-and-slow trade-off for this use case.
HTTP body inspection: Agreed that the main blind spot is POST bodies. For our minimal profile we’ll keep body inspection disabled. For higher-resource profiles we’ll make a small request-body limit configurable, but response body inspection will remain disabled.
AF_PACKET: We’ll stick with cluster_flow. The note about other modes being more sensitive to misconfiguration matches our experience on low-core VPS systems.
Output method: Good to hear confirmation that file-based EVE output vs socket is not expected to materially change latency. We’ll continue with file-based output and focus on reducing log volume and output batching rather than changing transport.
Latency: The comment about IDS mode waiting for TCP ACKs is especially useful. We plan to experiment with running the stream engine in “inline” mode (without full IPS/NFQUEUE) to see how much it reduces alert→action latency, and we’ll measure CPU/drops impact before adopting it.

Thanks again — this helped validate several design decisions for our single-server hardening use case.