Byte_extract / byte_test string limits

I was attempting to write a rule that uses byte_extract and byte_test to validate that a 32 byte string extracted the http.uri buffer was found again in the http.cookie buffer.

When I attempted this following logic

http.uri;  content:"&foo="; byte_extract:32,0,TESTrelative,string; http.cookie; content:"foo="; startswith; byte_test:32,=,TEST,0,relative,string;

I was presented with the following error:

<Error> - [ERRCODE: SC_ERR_INVALID_SIGNATURE(39)] - byte_extract can't process more than 20 bytes in "string" extraction

When I looked at the code i found the following constants defined within detect-byte_extract.c

I’m left wondering a couple things:

  1. What is the reasoning/background behind these limitations?

  2. Is there a better way to enforce that an extracted content match between multiple buffers?

    The only alternative I’ve found is using PCRE captures within a single buffer (http.start works ok in this example). But in testing, the keyword performance reports the combination of byte_extract and byte_test taking about half as many ticks as PCRE.

Regarding #1, I’m guessing it’s a limitation carried over from snort. Snort 3 continues to limit the count parameter to values [1-10] (when string is used) and [1-4] otherwise.

The limitation exists because byte_extract has always been used to extract numeric quantities (hence the 1-4 and other value limitations) instead of being a general purpose “byte extraction mechanism”. The numeric values are normally used with byte_jump, et. al.

We have a mechanism that may help – flowvars – but I don’t think that will allow the comparison logic to work the way you’d like.

That said, we could make a change to byte_extract to extract “byte buffers” with restrictions to prevent the value from being used in places where a numeric value is expected.

Thoughts @zoomequipd ?

We have a mechanism that may help – flowvars – but I don’t think that will allow the comparison logic to work the way you’d like.

I did check into those, and I agree, they don’t quite meet the use case.

That said, we could make a change to byte_extract to extract “byte buffers” with restrictions to prevent the value from being used in places where a numeric value is expected.

I’m all for it!

I found additional examples oft his scattered in the ET ruleset. One of the more common ones was used within Phishing sigs and utilized PCRE capture groups on the http.header buffer to compare the host extracted from the referer header and compares it to the host header

I’ll get a feature submitted.

Wasn’t too sure how to title this request, but ticket here.