EVE log JSON keys inconsistent order

Just a general question about writing the EVE log to syslog. I noticed the JSON provided has the keys in a somewhat random order. The documentation for all versions of Suricata completely skips over describing the function of the “Preserve-order” option. If I change this to “no”, will the JSON keys be sorted in alphabetical order?

File contents:
json:
# Sort object keys in the same order as they were inserted
preserve-order: yes

What does “same order as they were inserted” actually mean?

As of Suricata 6 and newer (perhaps 5 as well), this configuration parameter has little effect, except for maybe stats logs.

Our JSON events will always have the order preserved as they are written to a buffer as they are added, and not the usual add to an unordered hashmap and then write out.

Where you might see what looks like an order change is when one event has a field that another doesn’t - it could be in between 2 fields for example.

What are you seeing that appears random?

Cleaned up two logs to use as an example…

suricata[152149]: {'timestamp': '2024-06-05T15:10:21.802788+0000', 'flow': {'bytes_toserver': 72308119, 'pkts_toserver': 49789, 'pkts_toclient': 9741, 'bytes_toclient': 642906, 'start': '2024-06-05T15:10:16.676392+0000'}, 'src_ip': '10.x.x.x', 'proto': 'TCP', 'flow_id': 829409721864744, 'event_type': 'alert', 'packet': 'x=', 'dest_ip': '10.x.x.x', 'in_iface': 'enp101s0f1', 'payload_printable': 'xyz', 'vlan': [1], 'stream': 0, 'src_port': 38840, 'dest_port': 514, 'alert': {'action': 'allowed', 'metadata': {'attack_target': ['Client_and_Server'], 'updated_at': ['2019_07_26'], 'signature_severity': ['Critical'], 'performance_impact': ['Low'], 'deployment': ['Datacenter', 'Perimeter'], 'created_at': ['2016_09_15'], 'affected_product': ['Windows_XP_Vista_7_8_10_Server_32_64_Bit']}, 'gid': 1, 'signature_id': 2023221, 'severity': 1, 'rev': 1, 'signature': 'ET MALWARE Windows WMIC PROCESS get Microsoft Windows DOS prompt command exit OUTBOUND', 'category': 'A Network Trojan was detected'}, 'payload': 'xyz=', 'packet_info': {'linktype': 1}}

suricata[29683]: {'src_ip': '10.x.x.x', 'timestamp': '2024-06-05T02:33:28.618360+0000', 'event_type': 'alert', 'packet_info': {'linktype': 1}, 'dest_port': 443, 'payload': '', 'flow_id': 1039978255118200, 'src_port': 46142, 'in_iface': 'enp1s0f1', 'vlan': [1], 'dest_ip': '103.x.x.x', 'proto': 'TCP', 'stream': 0, 'alert': {'action': 'allowed', 'gid': 1, 'signature_id': 50150081, 'signature': 'AVERT Connection to known cryptomining node 81', 'rev': 1, 'category': 'Misc Attack', 'severity': 2}, 'payload_printable': '', 'flow': {'pkts_toserver': 1, 'pkts_toclient': 0, 'bytes_toserver': 74, 'bytes_toclient': 0, 'start': '2024-06-05T02:33:28.618360+0000'}, 'packet': 'xyz='}

The first log starts with the timestamp, while the second has the source IP first. These logs are from two different servers.

Editing to also mention the fields are mostly identical between the logs, just in a different order.

What version of Suricata? Is there anything in between that could be parsing and re-generating the JSON? Modern versions of Suricata won’t do this.

Unfortunately, I don’t have access to the servers producing these logs to know what version of Suricata is being utilized. I will forward the question though. Do you know approximately what version of Suricata wouldn’t exhibit this behavior?

I do know that the logs are being forwarded directly from the servers using rsyslog, so they’re not being modified.

I’m just a SIEM consultant trying to parse metadata out of these logs using regex. I know I could write a workable regex solution using look-ahead for all the fields, but the performance impact is too damaging considering the volume.

As 6.0, which is coming close to EOL, it is impossible for Suricata to log these in random order due to the JSON output implementation.

Even 5.0 (EOL Aug 2022) and older we were pretty good about keeping order, in particular the timestamp first.

If I had to guess, rsyslog, which can be JSON aware is doing some re-encoding.

Thanks you very much for your replies. I notice some patterns in my 30k sample logs.

  1. Each server has a random order for the JSON keys, but it is consistent for a period of time before it changes. For example, server 1’s first three keys are proto, dest_ip and timestamp on 6/5/24 but then change to dest_ip, timestamp and proto on 6/11/24.
  2. All servers output a different order. So for 40+ servers, I’ve got 40+ different orders at any given time.

My question is if the endpoints are using an EOL version of Suricata and they update, will the order be identical for any and all Suricata servers?

Given the same configuration, version and options they should. But it can change over versions as we add a new field, etc.

But its also important to note, that even then, there might be “missing” keys. For example, an http entry will only contain a http_user_agent if there was a user agent on the wire. Otherwise that object won’t contain an http_user_agent, for example:

{
  "hostname": "ocsp.pki.goog",
  "url": "/gts1c3/A22EpJ%2F%2BOR8ASY7RsaCnxxQ%3D%3D",
  "http_user_agent": "com.apple.trustd/3.0",
  "http_content_type": "application/ocsp-response",
  "http_method": "GET",
  "protocol": "HTTP/1.1",
  "status": 200,
  "length": 472
}
{
  "hostname": "fedoraproject.org",
  "url": "/static/hotspot.txt",
  "http_content_type": "text/plain",
  "http_method": "GET",
  "protocol": "HTTP/1.1",
  "status": 200,
  "length": 2
}

But the order out of Suricata will be the same. And when there are array values, its typically the order seen on the wire. Just some things to be conscious of if not using a proper JSON parser.

Thanks again, that’s great info! A missing/optional field can easily and efficiently be handled by regex, as long as the order is consistent. I will encourage my customer to update their Suricata version.