Suricata Packet Drop at high rate with Napatech

Hy , I am using napatech card following
https://suricata.readthedocs.io/en/suricata-6.0.3/capture-hardware/napatech.html
But it maintains it for 1-2 gb only , after that I notice significant drop
Not sure of issue beacuse I have given enough hardware requirements and also configuration hould be able to handle atleast 20 G , I have stopped writing in logs to see any difference still it dumps .
I am sharing my suricata.yaml and napatech ntservice.ini
Suricata.yaml


# Linux high speed capture support
af-packet:
  - interface: eth0
 
    threads: auto
    # Default clusterid. AF_PACKET will load balance packets based on flow.
    cluster-id: 99
    cluster-type: cluster_qm
    # In some fragmentation cases, the hash can not be computed. If "defrag" is set
    # to yes, the kernel will do the needed defragmentation before sending the packets.
    defrag: yes
    # To use the ring feature of AF_PACKET, set 'use-mmap' to yes
    use-mmap: yes
    tpacket-v3: yes
    ring-size: 300000
    # Block size is used by tpacket_v3 only. It should set to a value high enough to contain
    # a decent number of packets. Size is in bytes so please consider your MTU. It should be
    # a power of 2 and it must be multiple of page size (usually 4096).
    block-size: 2097152
  # Put default values here. These will be used for an interface that is not
  # in the list above.
  - interface: default
    #threads: auto
    #use-mmap: no
    #tpacket-v3: yes

# Cross platform libpcap capture support



##
## Step 4: App Layer Protocol configuration
##

# Configure the app-layer parsers. The protocol's section details each
# protocol.
#
# The option "enabled" takes 3 values - "yes", "no", "detection-only".
# "yes" enables both detection and the parser, "no" disables both, and
# "detection-only" enables protocol detection only (parser disabled).
app-layer:
  protocols:
    rfb:
      enabled: yes
      detection-ports:
        dp: 5900, 5901, 5902, 5903, 5904, 5905, 5906, 5907, 5908, 5909
    # MQTT, disabled by default.
    mqtt:
      # enabled: no
      # max-msg-length: 1mb
      # subscribe-topic-match-limit: 100
      # unsubscribe-topic-match-limit: 100
    krb5:
      enabled: yes
    snmp:
      enabled: yes
    ikev2:
      enabled: yes
    tls:
      enabled: yes
      detection-ports:
        dp: 443



    dcerpc:
      enabled: yes
    ftp:
      enabled: yes
      # memcap: 64mb
    rdp:
      #enabled: yes
    ssh:
      enabled: yes
      #hassh: yes
    # HTTP2: Experimental HTTP 2 support. Disabled by default.
    http2:
      enabled: no
      # use http keywords on HTTP2 traffic
      http1-rules: no
    smtp:
      enabled: yes
      raw-extraction: no
      # Configure SMTP-MIME Decoder
      mime:
        # Decode MIME messages from SMTP transactions
        # (may be resource intensive)
        # This field supersedes all others because it turns the entire
        # process on or off
        decode-mime: yes

        # Decode MIME entity bodies (ie. Base64, quoted-printable, etc.)
        decode-base64: yes
        decode-quoted-printable: yes

        # Maximum bytes per header data value stored in the data structure
        # (default is 2000)
        header-value-depth: 2000

        # Extract URLs and save in state data structure
        extract-urls: yes
        # Set to yes to compute the md5 of the mail body. You will then
        # be able to journalize it.
        body-md5: no
      # Configure inspected-tracker for file_data keyword
      inspected-tracker:
        content-limit: 100000
        content-inspect-min-size: 32768
        content-inspect-window: 4096
    imap:
      enabled: detection-only
    smb:
      enabled: yes
      detection-ports:
        dp: 139, 445

      # Stream reassembly size for SMB streams. By default track it completely.
      #stream-depth: 0

    nfs:
      enabled: yes
    tftp:
      enabled: yes
    dns:
      tcp:
        enabled: yes
        detection-ports:
          dp: 53
      udp:
        enabled: yes
        detection-ports:
          dp: 53
    http:
      enabled: yes
      
      libhtp:
         default-config:
           personality: IDS

           # Can be specified in kb, mb, gb.  Just a number indicates
           # it's in bytes.
           request-body-limit: 100kb
           response-body-limit: 100kb

           # inspection limits
           request-body-minimal-inspect-size: 32kb
           request-body-inspect-window: 4kb
           response-body-minimal-inspect-size: 40kb
           response-body-inspect-window: 16kb

           # response body decompression (0 disables)
           response-body-decompress-layer-limit: 2

           # auto will use http-body-inline mode in IPS mode, yes or no set it statically
           http-body-inline: auto

           # Decompress SWF files.
  
           swf-decompression:
             enabled: yes
             type: both
             compress-depth: 100kb
             decompress-depth: 100kb

           # decoding
           double-decode-path: no
           double-decode-query: no



         server-config:

    modbus:
      # How many unanswered Modbus requests are considered a flood.
      # If the limit is reached, the app-layer-event:modbus.flooded; will match.
      #request-flood: 500

      enabled: no
      detection-ports:
        dp: 502
      # According to MODBUS Messaging on TCP/IP Implementation Guide V1.0b, it
      # is recommended to keep the TCP connection opened with a remote device
      # and not to open and close it for each MODBUS/TCP transaction. In that
      # case, it is important to set the depth of the stream reassembling as
      # unlimited (stream.reassembly.depth: 0)

      # Stream reassembly size for modbus. By default track it completely.
      stream-depth: 0

    # DNP3
    dnp3:
      enabled: no
      detection-ports:
        dp: 20000

    # SCADA EtherNet/IP and CIP protocol support
    enip:
      enabled: no
      detection-ports:
        dp: 44818
        sp: 44818

    ntp:
      enabled: yes

    dhcp:
      enabled: yes

    sip:
      #enabled: no

# Limit for the maximum number of asn1 frames to decode (default 256)
asn1-max-frames: 256

# Datasets default settings
# datasets:
#   # Default fallback memcap and hashsize values for datasets in case these
#   # were not explicitly defined.
#   defaults:
#     memcap: 100mb
#     hashsize: 2048

##############################################################################
##
## Advanced settings below
##
##############################################################################

##
## Run Options
##

# Run Suricata with a specific user-id and group-id:
#run-as:
#  user: suri
#  group: suri

# Some logging modules will use that name in event as identifier. The default
# value is the hostname
#sensor-name: suricata

# Default location of the pid file. The pid file is only used in
# daemon mode (start Suricata with -D). If not running in daemon mode
# the --pidfile command line option must be used to create a pid file.
#pid-file: /usr/local/var/run/suricata.pid

# Daemon working directory
# Suricata will change directory to this one if provided
# Default: "/"
#daemon-directory: "/"

# Umask.
# Suricata will use this umask if it is provided. By default it will use the
# umask passed on by the shell.
#umask: 022


coredump:
  max-dump: unlimited

host-mode: auto

# Number of packets preallocated per thread. The default is 1024. A higher number 
# will make sure each CPU will be more easily kept busy, but may negatively 
# impact caching.
max-pending-packets: 65500

# Runmode the engine should use. Please check --list-runmodes to get the available
# runmodes for each packet acquisition method. Default depends on selected capture
# method. 'workers' generally gives best performance.
#runmode: autofp

# Specifies the kind of flow load balancer used by the flow pinned autofp mode.
#
# Supported schedulers are:
#
# hash     - Flow assigned to threads using the 5-7 tuple hash.
# ippair   - Flow assigned to threads using addresses only.
#
#autofp-scheduler: hash

# Preallocated size for each packet. Default is 1514 which is the classical
# size for pcap on Ethernet. You should adjust this value to the highest
# packet size (MTU + hardware header) on your system.
#default-packet-size: 1514

# Unix command socket that can be used to pass commands to Suricata.
# An external tool can then connect to get information from Suricata
# or trigger some modifications of the engine. Set enabled to yes
# to activate the feature. In auto mode, the feature will only be
# activated in live capture mode. You can use the filename variable to set
# the file name of the socket.
unix-command:
  enabled: auto
  #filename: custom.socket

legacy:
  uricontent: enabled


engine-analysis:
  # enables printing reports for fast-pattern for every rule.
  rules-fast-pattern: yes
  # enables printing reports for each rule
  rules: yes

#recursion and match limits for PCRE where supported
pcre:
  match-limit: 3500
  match-limit-recursion: 1500

##
## Advanced Traffic Tracking and Reconstruction Settings
##

# Host specific policies for defragmentation and TCP stream
# reassembly. The host OS lookup is done using a radix tree, just
# like a routing table so the most specific entry matches.
host-os-policy:
  # Make the default policy windows.
  windows: [0.0.0.0/0]
  bsd: []
  bsd-right: []
  old-linux: []
  linux: []
  old-solaris: []
  solaris: []
  hpux10: []
  hpux11: []
  irix: []
  macos: []
  vista: []
  windows2k3: []

# Defrag settings:

defrag:
  memcap: 1gb
  hash-size: 65536
  trackers: 65535 # number of defragmented flows to follow
  max-frags: 65535 # number of fragments to keep (higher than trackers)
  prealloc: yes
  timeout: 60

# Enable defrag per host settings
#  host-config:
#
#    - dmz:
#        timeout: 30
#        address: [192.168.1.0/24, 127.0.0.0/8, 1.1.1.0/24, 2.2.2.0/24, "1.1.1.1", "2.2.2.2", "::1"]
#
#    - lan:
#        timeout: 45
#        address:
#          - 192.168.0.0/24
#          - 192.168.10.0/24
#          - 172.16.14.0/24

flow:
  memcap: 8gb
  hash-size: 256072
  prealloc: 300000
  emergency-recovery: 30
  #managers: 1 # default to one flow manager
  #recyclers: 1 # default to one flow recycler thread

vlan:
  use-for-tracking: true

flow-timeouts:

  default:
    new: 30
    established: 300
    closed: 0
    bypassed: 100
    emergency-new: 10
    emergency-established: 100
    emergency-closed: 0
    emergency-bypassed: 50
  tcp:
    new: 60
    established: 600
    closed: 60
    bypassed: 100
    emergency-new: 5
    emergency-established: 100
    emergency-closed: 10
    emergency-bypassed: 50
  udp:
    new: 30
    established: 300
    bypassed: 100
    emergency-new: 10
    emergency-established: 100
    emergency-bypassed: 50
  icmp:
    new: 30
    established: 300
    bypassed: 100
    emergency-new: 10
    emergency-established: 100
    emergency-bypassed: 50

stream:
  memcap: 8gb
  checksum-validation: no
  async-oneside: true      # reject incorrect csums
  inline: no
  drop-invalid: yes
  bypass: yes                  # auto will use inline mode in IPS mode, yes or no set it statically
  reassembly:
    memcap: 2gb
    depth: 1mb                  # reassemble 1mb into a stream
    toserver-chunk-size: 2560
    toclient-chunk-size: 2560
    randomize-chunk-size: yes
    #randomize-chunk-range: 10
    #raw: yes
    segment-prealloc: 200000
    #check-overlap-different-data: true

# Host table:
#
# Host table is used by the tagging and per host thresholding subsystems.
#
host:
  hash-size: 4096
  prealloc: 1000
  memcap: 32mb

# IP Pair table:
#
# Used by xbits 'ippair' tracking.
#
#ippair:
#  hash-size: 4096
#  prealloc: 1000
#  memcap: 32mb

# Decoder settings

decoder:
  # Teredo decoder is known to not be completely accurate
  # as it will sometimes detect non-teredo as teredo.
  teredo:
    enabled: true
    # ports to look for Teredo. Max 4 ports. If no ports are given, or
    # the value is set to 'any', Teredo detection runs on _all_ UDP packets.
    ports: $TEREDO_PORTS # syntax: '[3544, 1234]' or '3533' or 'any'.

  # VXLAN decoder is assigned to up to 4 UDP ports. By default only the
  # IANA assigned port 4789 is enabled.
  vxlan:
    enabled: true
    ports: $VXLAN_PORTS # syntax: '[8472, 4789]' or '4789'.

  # VNTag decode support
  vntag:
    enabled: false

  # Geneve decoder is assigned to up to 4 UDP ports. By default only the
  # IANA assigned port 6081 is enabled.
  geneve:
    enabled: true
    ports: $GENEVE_PORTS # syntax: '[6081, 1234]' or '6081'.

  # maximum number of decoder layers for a packet
  # max-layers: 16

##
## Performance tuning and profiling
##
detect:
  profile: high
  custom-values:
    toclient-groups: 3
    toserver-groups: 25
  sgh-mpm-context: auto
  inspection-recursion-limit: 3000
  # If set to yes, the loading of signatures will be made after the capture
  # is started. This will limit the downtime in IPS mode.
  #delayed-detect: yes

  prefilter:
    # default prefiltering setting. "mpm" only creates MPM/fast_pattern
    # engines. "auto" also sets up prefilter engines for other keywords.
    # Use --list-keywords=all to see which keywords support prefiltering.
    default: mpm

  # the grouping values above control how many groups are created per
  # direction. Port whitelisting forces that port to get its own group.
  # Very common ports will benefit, as well as ports with many expensive
  # rules.
  grouping:
    #tcp-whitelist: 53, 80, 139, 443, 445, 1433, 3306, 3389, 6666, 6667, 8080
    #udp-whitelist: 53, 135, 5060

  profiling:
    # Log the rules that made it past the prefilter stage, per packet
    # default is off. The threshold setting determines how many rules
    # must have made it past pre-filter for that rule to trigger the
    # logging.
    #inspect-logging-threshold: 200
    grouping:
      dump-to-disk: false
      include-rules: false      # very verbose
      include-mpm-stats: false



mpm-algo: auto

# Select the matching algorithm you want to use for single-pattern searches.
#
# Supported algorithms are "bm" (Boyer-Moore) and "hs" (Hyperscan, only
# available if Suricata has been built with Hyperscan support).
#
# The default of "auto" will use "hs" if available, otherwise "bm".

spm-algo: auto

# Suricata is multi-threaded. Here the threading can be influenced.
threading:
  set-cpu-affinity: yes
  
  cpu-affinity:
    - management-cpu-set:
        cpu: [ 0,1,2,3,4,5,6,7 ]  # include only these CPUs in affinity settings
    - receive-cpu-set:
        cpu: [ 0 ]  # include only these CPUs in affinity settings
    - worker-cpu-set:
        cpu: [ 30,31,32,33,34,35,36,37,38,39,70,71,72,73,74,75,76,77,78,79 ]
        mode: "exclusive"
        # Use explicitly 3 threads and don't compute number by using
        # detect-thread-ratio variable:
        # threads: 3
        prio:
          low: [ 0 ]
          medium: [ "1-2" ]
          high: [ 30,31,32,33,34,35,36,37,38,39,70,71,72,73,74,75,76,77,78,79 ]
          default: "high"
    #- verdict-cpu-set:
    #    cpu: [ 0 ]
    #    prio:
    #      default: "high"
  detect-thread-ratio: 1.0

# Luajit has a strange memory requirement, its 'states' need to be in the
# first 2G of the process' memory.
#
# 'luajit.states' is used to control how many states are preallocated.
# State use: per detect script: 1 per detect thread. Per output script: 1 per
# script.
luajit:
  states: 128

# Profiling settings. Only effective if Suricata has been built with
# the --enable-profiling configure flag.
#
profiling:
  # Run profiling for every X-th packet. The default is 1, which means we
  # profile every packet. If set to 1000, one packet is profiled for every
  # 1000 received.
  #sample-rate: 1000

  # rule profiling
  rules:

    # Profiling can be disabled here, but it will still have a
    # performance impact if compiled in.
    enabled: yes
    filename: rule_perf.log
    append: yes

    # Sort options: ticks, avgticks, checks, matches, maxticks
    # If commented out all the sort options will be used.
    #sort: avgticks

    # Limit the number of sids for which stats are shown at exit (per sort).
    limit: 10

    # output to json
    json: yes

  # per keyword profiling
  keywords:
    enabled: yes
    filename: keyword_perf.log
    append: yes

  prefilter:
    enabled: yes
    filename: prefilter_perf.log
    append: yes

  # per rulegroup profiling
  rulegroups:
    enabled: yes
    filename: rule_group_perf.log
    append: yes

  # packet profiling
  packets:

    # Profiling can be disabled here, but it will still have a
    # performance impact if compiled in.
    enabled: yes
    filename: packet_stats.log
    append: yes

    # per packet csv output
    csv:

      # Output can be disabled here, but it will still have a
      # performance impact if compiled in.
      enabled: no
      filename: packet_stats.csv

  # profiling of locking. Only available when Suricata was built with
  # --enable-profiling-locks.
  locks:
    enabled: no
    filename: lock_stats.log
    append: yes

  pcap-log:
    enabled: no
    filename: pcaplog_stats.log
    append: yes

##
## Netfilter integration
##

# When running in NFQ inline mode, it is possible to use a simulated
# non-terminal NFQUEUE verdict.
# This permits sending all needed packet to Suricata via this rule:
#        iptables -I FORWARD -m mark ! --mark $MARK/$MASK -j NFQUEUE
# And below, you can have your standard filtering ruleset. To activate
# this mode, you need to set mode to 'repeat'
# If you want a packet to be sent to another queue after an ACCEPT decision
# set the mode to 'route' and set next-queue value.
# On Linux >= 3.1, you can set batchcount to a value > 1 to improve performance
# by processing several packets before sending a verdict (worker runmode only).
# On Linux >= 3.6, you can set the fail-open option to yes to have the kernel
# accept the packet if Suricata is not able to keep pace.
# bypass mark and mask can be used to implement NFQ bypass. If bypass mark is
# set then the NFQ bypass is activated. Suricata will set the bypass mark/mask
# on packet of a flow that need to be bypassed. The Nefilter ruleset has to
# directly accept all packets of a flow once a packet has been marked.
nfq:
#  mode: accept
#  repeat-mark: 1
#  repeat-mask: 1
#  bypass-mark: 1
#  bypass-mask: 1
#  route-queue: 2
#  batchcount: 20
#  fail-open: yes


napatech:
    # When use_all_streams is set to "yes" the initialization code will query
    # the Napatech service for all configured streams and listen on all of them.
    # When set to "no" the streams config array will be used.
    #
    # This option necessitates running the appropriate NTPL commands to create
    # the desired streams prior to running Suricata.
    # use-all-streams: yes

    # The streams to listen on when auto-config is disabled or when and threading
    # cpu-affinity is disabled.  This can be either:
    #   an individual stream (e.g. streams: [0])
    # or
    #   a range of streams (e.g. streams: ["0-3"])
    #
    streams: ["0-19"]

    # Stream stats can be enabled to provide fine grain packet and byte counters
    # for each thread/stream that is configured.
    #
    enable-stream-stats: no

    # When auto-config is enabled the streams will be created and assigned
    # automatically to the NUMA node where the thread resides.  If cpu-affinity
    # is enabled in the threading section.  Then the streams will be created
    # according to the number of worker threads specified in the worker-cpu-set.
    # Otherwise, the streams array is used to define the streams.
    #
    # This option is intended primarily to support legacy configurations.
    #
    # This option cannot be used simultaneously with either "use-all-streams"
    # or "hardware-bypass".
    #
    auto-config: yes

    # Enable hardware level flow bypass.
    #
    hardware-bypass: no

    # Enable inline operation.  When enabled traffic arriving on a given port is
    # automatically forwarded out its peer port after analysis by Suricata.
    #
    inline: no

    # Ports indicates which Napatech ports are to be used in auto-config mode.
    # these are the port IDs of the ports that will be merged prior to the
    # traffic being distributed to the streams.
    #
    # When hardware-bypass is enabled the ports must be configured as a segment.
    # specify the port(s) on which upstream and downstream traffic will arrive.
    # This information is necessary for the hardware to properly process flows.
    #
    # When using a tap configuration one of the ports will receive inbound traffic
    # for the network and the other will receive outbound traffic. The two ports on a
    # given segment must reside on the same network adapter.
    #
    # When using a SPAN-port configuration the upstream and downstream traffic
    # arrives on a single port. This is configured by setting the two sides of the
    # segment to reference the same port.  (e.g. 0-0 to configure a SPAN port on
    # port 0).
    #
    # port segments are specified in the form:
    #    ports: [0-1,2-3,4-5,6-6,7-7]
    #
    # For legacy systems when hardware-bypass is disabled this can be specified in any
    # of the following ways:
    #
    #   a list of individual ports (e.g. ports: [0,1,2,3])
    #
    #   a range of ports (e.g. ports: [0-3])
    #
    #   "all" to indicate that all ports are to be merged together
    #   (e.g. ports: [all])
    #
    # This parameter has no effect if auto-config is disabled.
    #
    ports: [all]

    # When auto-config is enabled the hashmode specifies the algorithm for
    # determining to which stream a given packet is to be delivered.
    # This can be any valid Napatech NTPL hashmode command.
    #
    # The most common hashmode commands are:  hash2tuple, hash2tuplesorted,
    # hash5tuple, hash5tuplesorted and roundrobin.
    #
    # See Napatech NTPL documentation other hashmodes and details on their use.
    #
    # This parameter has no effect if auto-config is disabled.
    #
    hashmode: hash5tuplesorted

##
## Configure Suricata to load Suricata-Update managed rules.
##

default-rule-path: /usr/local/var/lib/suricata/rules

rule-files:
  - suricata.rules  

##
## Auxiliary configuration files.
##

classification-file: /usr/local/etc/suricata/classification.config
reference-config-file: /usr/local/etc/suricata/reference.config
# threshold-file: /usr/local/etc/suricata/threshold.config

ntservice.ini of napatech

[System]
HostBufferRefreshIntervalAll = default   # default* - 1 - 5 - 10 - 50 - 100 - 250 - 500
LinkPropagationPortPairs =               # [portA, portB], ...
NtplFileName =                           # String
NumWorkerThreads = 16                    # 1 .. 100
SDRAMFillLevelWarning = 0                # X1, X2, X3, X4
TimeSyncOsTimeReference = None           # None* - adapter-0 - adapter-1 - adapter-2 - adapter-3 - adapter-4 - adapter-5 - adapter-6 - adapter-7
TimestampFormat = NATIVE_UNIX            # NATIVE - NATIVE_NDIS - NATIVE_UNIX* - UNIX_NS - PCAP - PCAP_NS
TimestampMethod = EOF                    # SOF - EOF*

[Logging]
LogBufferWrap = wrap                     # wrap* - nowrap
LogFileName = /tmp/Log3G_%s.log          # String
LogMask = 7                              # See ini-file help for information about possible values
LogToFile = false                        # true/false
LogToSystem = true                       # true/false

[Adapter0]
AdapterType = NT20E3_2_PTP               # NT40A01_4X1 - NT20E3_2_PTP - NT40E3_4_PTP - NT50B01_2X10_25 - NT50B01_2X25 - NT50B01_2X1_10 - NT100A01_4X1_10 - NT100A01_4X10_25 - NT80E3_2_PTP - NT80E3_2_PTP_8X10 - NT100E3_1_PTP - NT200A01 - NT200A01_2X100 - NT200A01_8X10 - NT200A01_2X40 - NT200A01_2X10_25 - NT200A01_2X25 - NT200A02_2X10_25 - NT200A02_2X25 - NT200A02_2X100 - NT200A02_2X40 - NT200A02_4X10_25 - NT200A02_4X25 - NT200A02_8X10 - NT200A02_2X1_10 - NT4E - NT20E - NT4E_STD - NT20E2 - NT40E2_1 - NT40E2_4 - NT4E2_PTP - NT20E2_PTP - INTEL_A10_4X10 - INTEL_A10_1X40
BondingType = Separate                   # Separate*
CancelTxOnCloseMask = 0                  # See ini-file help for information about possible values
DeduplicationWindow = 100                # 10 .. 2000000
DisableTxRemoteFault = 0                 # 1 - 0* - true - false*
DiscardSize = 16                         # 16 .. 63
HighFrequencySampling = DISABLE          # DISABLE* - ENABLE
HostBufferHandlerAffinity = -2           # -2 .. 79
HostBufferPollInterval = default         # default* - 10 - 50 - 100 - 250 - 500 - 1000 - 10000 - 25000 - 50000 - 100000
HostBufferRefreshIntervalRx = default    # default* - 1 - 5 - 10 - 50 - 100 - 250 - 500
HostBufferRefreshIntervalTx = default    # default* - 1 - 5 - 10 - 50 - 100 - 250 - 500
HostBufferSegmentAlignmentRx = default   # default* - none - 0 - 512 - 1024 - 2048 - 4096
HostBufferSegmentSizeRx = default        # default* - dynamic - 0 - 1 - 2 - 4 - 64K - 128K - 256K - 512K - 1M - 2M - 4M
HostBufferSegmentSizeTx = default        # default* - 1 - 2 - 4 - 1M - 2M - 4M
HostBufferSegmentTimeOut = default       # default* - 10 - 50 - 100 - 250 - 500 - 1000 - 10000 - 25000 - 50000 - 100000
HostBuffersRx = [20,256,3],[20,256,1]                # [x1, x2, x3], ...
HostBuffersTx = [4,16,-1]                # [x1, x2, x3], ...
IfgMode = NS                             # NS* - BYTE
KmTcamConfig = [2,4,0],[4,1,0]           # [cnt, len, dualLookup], ...
MaxFrameSize = 9018                      # 1518 .. 10000
NumaNode = -1                            # -1 .. 16
OnBoardMemorySplit = Even                # Dynamic - Even* - Proportional
PacketDescriptor = NT                    # PCAP - NT* - Ext9
PortDisableMask = 0                      # See ini-file help for information about possible values
PortSpeedMultiRate = 10G, 10G            # 1G - 10G
Profile = Capture                           # None* - Capture
PtpDhcp = ENABLE                         # DISABLE - ENABLE*
PtpMasterModeAllowed = DISABLE           # DISABLE* - ENABLE
PtpProfile = Default                     # Default* - Telecom - Power - Enterprise - G.8275.1
PtpUserDescription = Napatech adapter    # String
SofLinkSpeed = 10G                       # 100M - 1G - 10G
TimeSyncConnectorExt1 = PpsIn            # None - NttsIn* - PpsIn - NttsOut - PpsOut - RepeatInt1 - RepeatInt2
TimeSyncConnectorInt1 = None             # None* - NttsIn - PpsIn - NttsOut - PpsOut - RepeatExt1 - RepeatInt2
TimeSyncConnectorInt2 = None             # None* - NttsIn - PpsIn - NttsOut - PpsOut - RepeatExt1 - RepeatInt1
TimeSyncHardReset = ENABLE               # DISABLE - ENABLE*
TimeSyncNTTSInSyncLimit = 5000           # 1 .. 4294967295
TimeSyncOSInSyncLimit = 50000            # 1 .. 4294967295
TimeSyncPPSInSyncLimit = 5000            # 1 .. 4294967295
TimeSyncPTPInSyncLimit = 5000            # 1 .. 4294967295
TimeSyncReferencePriority = OSTime	 # FreeRun* - PTP - Int1 - Int2 - Ext1 - OSTime
TimeSyncTimeOffset = 0                   # 0 .. 1000000
TimestampInjectAlways = false            # true/false, ...
TimestampInjectDynamicOffset = TSI_DYN_SOF # TSI_DYN_SOF* - TSI_DYN_EOF - TSI_DYN_L3 - TSI_DYN_L4
TimestampInjectStaticOffset = 0          # -16384 .. 16383, ...
TxTiming = RELATIVE                      # ABSOLUTE - RELATIVE*
VXLANAltDestinationPorts = 4789,4789     # X1, X2


Are cores 30 and 70 (and 31 and 71, etc) thread siblings? I don’t recommend sharing a core with it’s hyperthreaded sibling. Could you include the output of lscpu

Note in your ntservice.ini file, you don’t need to establish any system buffers for TX

If you have sufficient system memory, you can increase each system buffer.
You’re specifying that numa node 3 and 1 be used for the system buffers – is that your intent?

The napatech nics are capable of much higher performance. I think having your system architecture (see the lstopo command in the hwloc package to get this).

Lastly, how are you measuring drop – the monitoring and profiling Napatech utilities can provide detailed information … monitoring will show drops and the types of drops that are occuring; profiling will show information for each stream and whether the ingress network traffic is split properly.

Hi
1] I noticed this sibling threads problem yesterday and made 28 threads that are not shared with any sibling .
2] That TX buffer was default , I hope it don’t affect performance
3] As you said I can increase buffer you mean in RX [20,256,1] , what size should I pic because I see my buffers sometimes in profiling tool 100% .
Sharing my lstopo output

Machine (110GB total)
  NUMANode L#0 (P#0 31GB) + Package L#0 + L3 L#0 (24MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#40)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#41)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#42)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#43)
    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
      PU L#8 (P#4)
      PU L#9 (P#44)
    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
      PU L#10 (P#5)
      PU L#11 (P#45)
    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
      PU L#12 (P#6)
      PU L#13 (P#46)
    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#47)
    L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
      PU L#16 (P#8)
      PU L#17 (P#48)
    L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
      PU L#18 (P#9)
      PU L#19 (P#49)
  NUMANode L#1 (P#1 31GB) + Package L#1 + L3 L#1 (24MB)
    L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
      PU L#20 (P#10)
      PU L#21 (P#50)
    L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
      PU L#22 (P#11)
      PU L#23 (P#51)
    L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
      PU L#24 (P#12)
      PU L#25 (P#52)
    L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
      PU L#26 (P#13)
      PU L#27 (P#53)
    L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
      PU L#28 (P#14)
      PU L#29 (P#54)
    L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
      PU L#30 (P#15)
      PU L#31 (P#55)
    L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
      PU L#32 (P#16)
      PU L#33 (P#56)
    L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17
      PU L#34 (P#17)
      PU L#35 (P#57)
    L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
      PU L#36 (P#18)
      PU L#37 (P#58)
    L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
      PU L#38 (P#19)
      PU L#39 (P#59)
  NUMANode L#2 (P#2 16GB) + Package L#2 + L3 L#2 (24MB)
    L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
      PU L#40 (P#20)
      PU L#41 (P#60)
    L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21
      PU L#42 (P#21)
      PU L#43 (P#61)
    L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22
      PU L#44 (P#22)
      PU L#45 (P#62)
    L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23
      PU L#46 (P#23)
      PU L#47 (P#63)
    L2 L#24 (256KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24
      PU L#48 (P#24)
      PU L#49 (P#64)
    L2 L#25 (256KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25
      PU L#50 (P#25)
      PU L#51 (P#65)
    L2 L#26 (256KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26
      PU L#52 (P#26)
      PU L#53 (P#66)
    L2 L#27 (256KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27
      PU L#54 (P#27)
      PU L#55 (P#67)
    L2 L#28 (256KB) + L1d L#28 (32KB) + L1i L#28 (32KB) + Core L#28
      PU L#56 (P#28)
      PU L#57 (P#68)
    L2 L#29 (256KB) + L1d L#29 (32KB) + L1i L#29 (32KB) + Core L#29
      PU L#58 (P#29)
      PU L#59 (P#69)
  NUMANode L#3 (P#3 31GB) + Package L#3 + L3 L#3 (24MB)
    L2 L#30 (256KB) + L1d L#30 (32KB) + L1i L#30 (32KB) + Core L#30
      PU L#60 (P#30)
      PU L#61 (P#70)
    L2 L#31 (256KB) + L1d L#31 (32KB) + L1i L#31 (32KB) + Core L#31
      PU L#62 (P#31)
      PU L#63 (P#71)
    L2 L#32 (256KB) + L1d L#32 (32KB) + L1i L#32 (32KB) + Core L#32
      PU L#64 (P#32)
      PU L#65 (P#72)
    L2 L#33 (256KB) + L1d L#33 (32KB) + L1i L#33 (32KB) + Core L#33
      PU L#66 (P#33)
      PU L#67 (P#73)
    L2 L#34 (256KB) + L1d L#34 (32KB) + L1i L#34 (32KB) + Core L#34
      PU L#68 (P#34)
      PU L#69 (P#74)
    L2 L#35 (256KB) + L1d L#35 (32KB) + L1i L#35 (32KB) + Core L#35
      PU L#70 (P#35)
      PU L#71 (P#75)
    L2 L#36 (256KB) + L1d L#36 (32KB) + L1i L#36 (32KB) + Core L#36
      PU L#72 (P#36)
      PU L#73 (P#76)
    L2 L#37 (256KB) + L1d L#37 (32KB) + L1i L#37 (32KB) + Core L#37
      PU L#74 (P#37)
      PU L#75 (P#77)
    L2 L#38 (256KB) + L1d L#38 (32KB) + L1i L#38 (32KB) + Core L#38
      PU L#76 (P#38)
      PU L#77 (P#78)
    L2 L#39 (256KB) + L1d L#39 (32KB) + L1i L#39 (32KB) + Core L#39
      PU L#78 (P#39)
      PU L#79 (P#79)

5] I am using profiling in napatech to see drops.

My concerns now at 10 G with eve.json enabled ( allowed only alerts )and 15k+ rules
I am seeing a drop of 15-17% .also my cpu go to 100% .

Can you suggest some changes in my suricata.yaml or ntservice.ini to make drops 0%.
Thanks

Is this an AMD system?
Are you using Hyperscan (it works with AMD)?

No need for the TX buffers to be configured – suggest removing them to simplify things.

Each host buffer should be the same size – so if you adjust from 256MB to something else, make sure all host buffers are the same size.

What Numa node is the NIC attached to? I suggest moving all the host buffers to the Numa node containing the NIC. The latency for Napatech to host memory is critical – so placing the system buffers and the NIC on the same numa node is important.

Are you using Suricata’s “threaded” mode for eve.json output? If not, you should as this will greatly reduce contention when writing alerts.