Af-packet might be stopping VPN traffic from functioning

Not sure yet what is causing the problem but VPN traffic seems be stopped sometimes. We did not change anything.

When bypassing af-packet interfaces, the VPN traffic does not get stop. Everything was working but not sure what would be stopping the traffic. af-packet is set to ips mode. Should we try changing af-packet to tap mode for troubleshooting purposes.

We are running Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-96-generic x86_64).

We are performing NIC offloading when running in af-packet mode.
sudo ethtool -k enp2s0f0
Features for enp2s0f0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]

Here is our af-packet configuration:
-interface: enp2s0f0
threads: auto
cluster-id: 99
cluster-type: cluster_flow
defrag: yes
use-mmap: yes
buffer-size: 64535
copy-mode: ips
copy-iface: enp2s0f1
-interface: enp2s0f1
threads: auto
cluster-id: 98
cluster-type: cluster_flow
defrag: yes
use-mmap: yes
buffer-size: 64535
copy-mode: ips
copy-iface: enp2s0f0

Where should we look to see what might be causing this problem?

We took the inline connections out and connected between router and firewall then IPSec VPN worked correctly. So something with AF-Packet or IKEv2 parser is causing a problem.

We started analyzing the IKEv2 traffic data and noticed that some users had multiple transactions or key exchanges within seconds to a couple of minutes of each with the same Flow ID. Other people had just a two way transaction and purported did not have any problem. Almost like there is a delay causing the problem. Thinking of disabling IKEv2 parsing to see if that is causing the problem.

We are seeing the following in the traffic. Not sure if this will help figure out the problem but there are some things we don’t understand.
{“timestamp”:“2020-06-11T17:04:16.266929+0000”,“flow_id”:1856378882516822,“in_iface”:“enp2s0f1”,“event_type”:“ikev2”,“src_ip”:“x.x.x.x”,“src_port”:500,“dest_ip”:“x.x.x.x”,“dest_port”:500,“proto”:“UDP”,“ikev2”:{“version_major”:2,“version_minor”:0,“exchange_type”:34,“message_id”:0,“init_spi”:“a0e411a9a55928ca”,“resp_spi”:“34129ce9ee6f3f9e”,“role”:“responder”,“alg_enc”:“ENCR_AES_CBC”,“alg_auth”:“AUTH_HMAC_SHA1_96”,“alg_prf”:“PRF_HMAC_SHA1”,“alg_dh”:“1024-bit MODP Group”,“alg_esn”:“NoESN”,“errors”:0,“payload”:[“SecurityAssociation”,“KeyExchange”,“Nonce”,“Notify”,“Notify”,“VendorID”,“NoNextPayload”],“notify”:[“NAT_DETECTION_SOURCE_IP”,“NAT_DETECTION_DESTINATION_IP”]}}

What does VendorID",“NoNextPayload mean?
What does alg_esn”:“NoESN” mean?

In this instance, there were 2 bidirectional transactions. Sometimes we only see one pair of bidirectional transactions. All are always within seconds of each other. And it does not matter which ISP connection is being used unlike what we were originally told.

Though we don’t have a good confirmation yet. We disabled ikev2 parser and now it appears VPN is working fine.

Do you use any IKEv2 related rules as well or just the parser for the app layer?

We did not see any signature rules triggered. We just disabled IKEv2 parsing and people were able to VPN in from one of the ISP connections with no problems. We are waiting for a decision to have the other ISP connection replugged into Suricata appliance so we can see the results on that. They were having trouble with VPN on both ISP connections.