Byte_extract / byte_test string limits

bmurphy · March 1, 2024, 9:13pm

I was attempting to write a rule that uses byte_extract and byte_test to validate that a 32 byte string extracted the http.uri buffer was found again in the http.cookie buffer.

When I attempted this following logic

http.uri;  content:"&foo="; byte_extract:32,0,TESTrelative,string; http.cookie; content:"foo="; startswith; byte_test:32,=,TEST,0,relative,string;

I was presented with the following error:

<Error> - [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - byte_extract can't process more than 20 bytes in "string" extraction

When I looked at the code i found the following constants defined within detect-byte_extract.c

github.com

OISF/suricata/blob/6d0e11e76c8e02deada688b767523512b70e51ec/src/detect-byte-extract.c#L71-L76


      
          /* the max no of bytes that can be extracted in string mode - (string, hex)
           * (string, oct) or (string, dec) */
          #define STRING_MAX_BYTES_TO_EXTRACT_FOR_OCT 23
          #define STRING_MAX_BYTES_TO_EXTRACT_FOR_DEC 20
          #define STRING_MAX_BYTES_TO_EXTRACT_FOR_HEX 14
          /* the max no of bytes that can be extracted in non-string mode */

I’m left wondering a couple things:

What is the reasoning/background behind these limitations?
Is there a better way to enforce that an extracted content match between multiple buffers?

The only alternative I’ve found is using PCRE captures within a single buffer (http.start works ok in this example). But in testing, the keyword performance reports the combination of byte_extract and byte_test taking about half as many ticks as PCRE.

Jeff_Lucovsky · March 2, 2024, 2:30pm

Regarding #1, I’m guessing it’s a limitation carried over from snort. Snort 3 continues to limit the count parameter to values [1-10] (when string is used) and [1-4] otherwise.

The limitation exists because byte_extract has always been used to extract numeric quantities (hence the 1-4 and other value limitations) instead of being a general purpose “byte extraction mechanism”. The numeric values are normally used with byte_jump, et. al.

We have a mechanism that may help – flowvars – but I don’t think that will allow the comparison logic to work the way you’d like.

That said, we could make a change to byte_extract to extract “byte buffers” with restrictions to prevent the value from being used in places where a numeric value is expected.

Thoughts @zoomequipd ?

bmurphy · March 4, 2024, 5:48pm

We have a mechanism that may help – flowvars – but I don’t think that will allow the comparison logic to work the way you’d like.

I did check into those, and I agree, they don’t quite meet the use case.

That said, we could make a change to byte_extract to extract “byte buffers” with restrictions to prevent the value from being used in places where a numeric value is expected.

I’m all for it!

I found additional examples oft his scattered in the ET ruleset. One of the more common ones was used within Phishing sigs and utilized PCRE capture groups on the http.header buffer to compare the host extracted from the referer header and compares it to the host header

I’ll get a feature submitted.

bmurphy · March 5, 2024, 6:13pm

Wasn’t too sure how to title this request, but ticket here.