Suricata Rules and machine learning classification traffic anomaly

Wilker_Luiz_Gadelha · May 1, 2025, 1:12am

I need help: a dataset containing suricataID, signature, classification, category and severity (1 highest severity, 2 medium, 3 lowest) among other features about flow. How can I use this data to segment and classify these alerts? Using machine learning techniques? Is it redundant to offer classification views? How can these attributes be used? Is it possible to present better performances, benchmarks? Is it possible for suricata not to record all efficient alert events in relation to the enumerated volume of very large rules? Could you help me reflect on this sense of integrating machine learning techniques in a dataset represented by a collection of log files?

hydn · May 18, 2025, 9:07pm

Definitely not redundant to add your own classification view. Custom views help spot real patterns, especially false positives. Also interested in following this topic.

azuleonyx · May 18, 2025, 11:58pm

@Wilker_Luiz_Gadelha what kind of classification are you talking about like threat type? I do agree with @hydn. However, I find that trying to use ML to classify the rules can be hard since some rules may not have a complete set of data (like missing references to help with context of the rule) or strange rule configuration to understand the classification.

Wilker_Luiz_Gadelha · May 19, 2025, 9:51pm

My intention would be to measure or mitigate some possibilities of comparative views in parallel. Values such as Suricata ID, Signature, Classification, severity, to be compared and to measure degrees of efficiency and effectiveness in detecting anomalies. With an engineering of event attributes, eve logs, defined in a minimalist way, to compare their efficiency in detecting false positives, events that are not very significant according to a policy, etc. To launch machine learning techniques on the events in such a way that it can be efficient as much as the classification of signatures and definition of severities. I want to work something like machine learning techniques in an unsupervised way, organizing into groupings that can be by time windows, and exploring outlier mining. Has anyone worked with this using native data from the Suricata IDS structures?