Can you narrow it down to those timeframes where the backups are happening that the drops increase? Elephant Flows are a typical case where drops occur.
You could try cluster_flow
in the workers mode.
And those NIC drops can also be caused by those Elephant Flows. Your NIC uses the ixgbe driver, right?
If you want to use cluster_qm
you should also ensure that you enable symmetric hashing and some other optimizations. If you have a rather new ethtool version and drivers try this:
ethtool -L enp5s0f1 combined 10
ethtool -X enp5s0f1 hkey 6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A equal 10
ethtool -K enp5s0f1 rxhash on
ethtool -K enp5s0f1 ntuple on
for i in rx tx tso gso gro lro tx-nocache-copy sg txvlan rxvlan; do ethtool -K enp5s0f1 $i off; done
for proto in tcp4 udp4 tcp6 udp6; do ethtool -N enp5s0f1rx-flow-hash $proto sdfn; done
ethtool -C enp5s0f1 adaptive-rx off rx-usecs 62
ethtool -G enp5s0f1 rx 1024
/usr/local/bin/set_irq_affinity 4-13 enp5s0f1
Some suggest to use sd
instead of sdfn
and other values are also worth playing around.