The Overhead with the FlowGetFlomFromHash is quite big. Can you post a full suricata.log and stats.log for the test run without rules?
With 6.5G we would have around 500Mbit/s per core, which could still be too much but if more threads didn’t help it must be something specific to the traffic.
Is this regular LAN and WAN traffic?
Is the traffic encapsulated?
Do you have bidirectional traffic?
The performance guide is for Intel cards, that won’t apply to Mellanox cards.