Suricata relevant statistics for a monitoring dashboard

Hello,

I’m building a Grafana dashboard on top of metrics collected by Telegraf and stored on InfluxDB. Something which I’d like to share with the community once I have something worthwhile.

The purpose of this dashboard is not necessarily to show all the metrics, but just those that provide quick view of resource usage and problem situations (both at suricata and nic level)

What I have so far:

  • Host total CPU usage %
  • Host total RAM usage %
  • Suricata process CPU usage %
  • Suricata process RAM usage %
  • Suricata internal stats: capture_errors, capture_kernel_drops, tcp_reassembly_gap (all threads totals)
  • Ethtool stats: rx/tx dropped/errors

What else would you recommend, for a generic dashboard? I know NIC’s have specific counters reported on ethtool depending on brand/model, so ideally this would not include those.

Thanks!

You might consider including Suricata thread CPU usage as well to indicate how uniform the work load is. Note that this may help with identifying elephant flows or other “interesting” traffic patterns.

Consider a setup with multiple worker threads processing traffic. A worker with disproportionately more CPU usage may indicate a high speed elephant flow or other anomaly.

1 Like