Following is an initial attempt for discussion on how to go ahead of integrating or using DPDK API in Suricata.
note: I am new to Suricata code flow, hence requesting for feedback and correction in choosing the steps.
Motivation:
Starting with suricata3.0, initial motivation was to use DPDK rx_burst and tx_burst to allow line-rate capture and measure the threshold limit for single worker thread with limited zero-copy. The goal was to identify
Identify the zero packet drop scenario for varying packet size.
Run multiple instances in VM/dockers for scale up/down as demand.
The maximum number of worker thread for 40Gbps processing.
The initial work started out on using Intel e1000, which later got ported to tap, ixgbe, i40e and vhost. Based on rule/signature addition we extend the filtering to RULE matched packet. Thus allowing packets to be forwarded to copy-interface and worker thread process for matched rules. This allowed on high data rate scenario where there are no rules, packets simply gets into BYPASS mode and statistics updated.
Note: Current sample can be found with https://github.com/vipinpv85/DPDK-Suricata_3.0/. Not all scenario is tested or validated too.
With release of suricata 4.1.1, goals were set for
Full worker mode for multiple threads.
Packet reassembly for ipv4/ipv6 fragments.
Static HW-RSS with worker pinning.
Deterministic flow to worker pining.
Flatten MBUF for full zero-copy.
The ongoing work can be found in https://github.com/vipinpv85/DPDK_SURICATA-4_1_1
Use HW offloads, eBPF based Filtering/Clone action from DPDK HW/SW.
In the new model, would like some feedback for the same
Merged mode - DPDK threads and Suricata Threads runs under same process.
Split mode - DPDK threads are process P1 and Suricata Threads are process P2.
Advantages of Split mode
No HW or vendor specific code.
Suricata base line will have minimalist and generic DPDK API.
Easy to implement packet clone feature either from HW/SW DPDK API.
Allow Suricata to update ACL entries for new rule addition.
To achieve the same we can use configuration or YAML entries specific for DPDK interfacing. Initial pull request is shared as https://github.com/OISF/suricata/pull/4902
please work with our git master, as we’ll not consider such a large feature for our 4.1 or 5.0 stable branches.
Do you think it is possible to have a reasonably generic way of supporting DPDK? I’ve seem multiple attempts and all of them were for quite specific scenario’s.
support should probably be added in smaller steps. First a basic packet source with runmode. Then bigger changes to other parts of Suricata.
Summary: description on DPDK build, library and run mode integration into dev branch of Suricata
Introduction:
DPDK is set of Hardware and Software library, that helps to run on userspace.
DPDK process is classified as Primary & Secondary, where huge pages and devices are shared between them.
For an existing application to be support DPDK; both build and code is to be changed.
One can skip the DPDK lcore-threads and service-threads too. But has to invoke rte_eal_init and relevant library call.
What mode to use: (to be decided - which one to support and start)
Primary (singleton/monolithic):
Pros:
a) Suricata will run as Primary managing all DPDK PMD and Libraries.
b) Requires access to hugepages and root permission.
c) Does not need ASLR to be disabled.
d) can run in baremetal CPU, VM, Docker too.
e) can make use DPDK secondary apps like proc-info, pdump, any other custom secondary application.
Cons:
a) pausible to run as non root, but requires DPDK familirity.
b) code becomes bulky.
c) HW vendor or device offload, code needs to updated with generic API or SW fallback.
Seocndary:
Pros:
a) Suricata will run as Secondary with zero or a little managment and setup code for PMD and Libraries.
b) Requires access to hugepages and root permission.
c) ASLR needs to be disabled, for consistent or hiigher chance of start.
d) can run in baremetal CPU, VM, Docker too.
e) Code becomes lighter.
Cons:
a) plausible to run as non-root, but requires DPDK familiarity.
b) cannot make use of DPDK secondary apps like proc-info, pdump, any other custom secondary application.
c) Need to probe the configuration settings for HW vendor or device offload.
Detached Primary:
Pros:
a) Suricata will run as Primary, getting packets from another DPDK primary via memif/vhost/AF_XDP interface.
b) Requires access to huge pages and root permission.
c) can run in bare-metal CPU, VM, Docker too.
d) Code becomes lighter because we are using SW generic NIC and offloads.
e) all vendor-specific and non DPDK offloads can be run on the alternative process.
f) Useful in scenarios where selected packet mirror can be implemented in HW, or SW and fed to DPDK.
g)
Cons:
a) plausible to run as non-root, but requires DPDK familiarity.
b) secondary apps like proc-info, pdump, any other custom secondary application works.
c) can make use of XDP (eBPF) to redirect selected traffic too.
How to do:
There are ABI and API changes across DPDK releases.
Use a long term stable release as de-facto for DPDK. example 19.11.1 LTS.
Depending upon individual or distro releases not all NIC, HW or features are enabled.
Identify and choose the most common NIC like memif/Pcap/tap/vhost for ease of build.
Update [configure.ac](http://configure.ac/) to reflect
a) $RTE_SDK & $RTE_TARGET for custom or distro dpdk package.
b) edit for new field --enable-dpdk as flag
c) add necessary changes for CFLAGS and LDFLAGS if flag is enabled.
Add Compiler flag HAVE_DPDK to build for DPDK mode.
Start for single and multi worker mode.
Code changes in
a) suricata.c: for DPDK initialization, run-mode registration, parse of suricata.yaml for DPDK sections and add-hook to Rules Add for DPDK ACL.
b) source-dpdk, run mode-dpdk: new files to support DPDK configuration and worker threads.
Please try to upload the image directly here, so it’s correctly embedded. If it doesn’t work, let us know.
On what machine with what ruleset? It might be interesting to establish a solid comparions with same input and the different options how to run Suricata so we can see how much of a value DPDK could be.
Machine: Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz
NIC: Ethernet Controller X710 for 10GbE SFP+ ( 2 * 10G), driverversion=2.1.14-k, firmware=6.01
packet generator: DPDK pktgen for arp, icmp, tcp, udp
test scenario: 64B, 128B, 512B (line rate)
Rule set: single rule with alert/drop action (to stress single worker)
alert udp any any -> any any (ttl:123; prefilter; sid:1;)
drop udp any any -> any any
drop tcp any any -> any any (msg:"Dir Command - Possible Remote Shell"; content:"dir"; sid:10000001;)
Note: goal is stress the environment with no match, or full match.
How do the merged/split modes relate to the primary/secondary/etc?
In general if we can keep the integration footprint small it would be great. But this also depends on the performance. If deeper, more intrusive, integration has clear performance and/or usability advantages, then we might still want to go that route.
I think maybe it would be a good idea to start with the simplest and least intrusive integration first and then evaluate when that is complete? Then in a phase 2 we could consider the more complex modes?
How do the merged/split modes relate to the primary/secondary/etc?
Answer> DPDK can either work in standalone mode (as Primary) or multi-process mode (as primary-secodnary).
If we got with Primary/Single process model, the suricata application needs to have both management code and data processing code in same binary. As mentioned in CONS this increases the code size and increases effort in maintaining the same.
To address this gap we have 2 options
Make suricata as secondary process: this reduces code size as it only houses data processing code. A separate DPDK primary is responsible for configuration setup and addition for new NIC, libraries and features. As mentioned in earlier comment these come with its cons too.
Make suricata as another seperate primary process: now we have 2 DPDK primary process, lets call them APP-1 and APP-2 (Suricata). APP-1 is responsible for HW NIC interface, configuration, Library setup and others. APP-1 connects to APP-2 (suricata) via memif/vhost interfaces only. This reduces the code for management and library reduce the build dependencies and distro release issue (DPDK pkg).
In general if we can keep the integration footprint small it would be great.
Answer> my recommendation is to use split mode with primary-1 & primary-2.
But this also depends on the performance. If deeper, more intrusive, integration has clear performance and/or usability advantages, then we might still want to go that route.
Answer> as mentioned in earlier current goal is to add the infrastructure required to fetch packets directly to user space via DPDK. Following are features targeted
Phase 1:
Allow PMD poll to receive threads in DPDK run mode - Done
Filter out non IP packets - done
Use ACL to classify/mark packets which needs suricata processing - Done
Use RSS flow to distribute to multiple workers - Done
Allow zero copy - done
avoid Packet_t alloc - done
Run on general processor (x86 Xeon) and Smart NIC - done
Phase 2:
allow autofp - possible
Allow primary-secondary model - possible
perf or vtune for decode/stream/output for possible dpdk acceleration - to do
use Hyper-scan to better matching - not doing, as suricata already supports the same.
Reassemble fragements before recieve thread - to do
Use user space packet copy - to do
Use DPDK eBPF for pkt-clone, tunnel parsing/decap/encap, mark - protoype is ready, need to add to suricata
I think maybe it would be a good idea to start with the simplest and least intrusive integration first
Answer> the standalone model with memif/vhost is best choice as it can be deployed on dockers/VM/baremetal alike. I will add a simplest
DPDK reference code too.
and then evaluate when that is complete? Then in a phase 2 we could consider the more complex modes?
Answer> Sure
Thanks. I think we can probably start the code contribution process around the ‘phase 1’ features?
2 further thoughts:
autofp support doesn’t seem very important.
timing: we are approaching a ‘freeze’ as we approach the suricata 6 release date. If you want to get the initial support into 6 there are just over 2 weeks left to get it done. No pressure
Thanks for Vipin Varghese’s work.
Machine: Intel® Xeon® CPU Gold 5115 @ 2.40GHz
NIC: Ethernet Controller X722 for 10GbE SFP+ ( 2 * 10G), driverversion=2.4.6, firmware=3.33
When I was testing suricata-dpdk, the packet loss of the network card continued to increase when the packet rate was greater than 1Gpps. Use perf to observe function hotspots , ‘common_ring_mp_enqueue’ ‘DpdkReleaseacket’ ‘ReceiveDpdkLoop’ functions take up a large proportion rather than 50%。Can you tell me what I can do to reduce the nic packet drop.
I am bit at a loss here, I have not shared any Pull Reuqest to the forum for testing. Hence I have to assume the code base you are referring is one of my earlier works in gitthub. If this is true I think you are using pretty old version of the code base because
current code base in github does not use DPDK RING Enqueue Dequeue
as per my internal testing with 28 Mpps, I do not find DpdkReleaseacket to be bottle neck.
with out context ReceiveDpdkLoop is using >50% is bit misleading. If this is based on older ring implementation this might be true.