Unexpected behavior on depth/offset modifiers

Hello,
I’m analyzing rules and feel a bit scratched about the behavior described below
Say a simple rule is setup:
alert tcp any any → any any (msg:“MY TEST RULE”; flow:established; content:"|43 41 50 20|"; classtype:trojan-activity; sid:2103272; rev:3; metadata:created_at 2010_09_23, updated_at 2010_09_23;)

So I’m expecting this is a tcp stream rule which is applied to a reassembled stream
Running suricata with --engine-analysis option confirms that:

Rule matches on reassembled stream

Adding “depth: xxx” option to this rule make the rule to be applied to packets also
I checked some source and found that SIG_FLAG_REQUIRE_PACKET is not set in detect-depth.c. However, it is set in detect-parse::SigValidate
This case is explicitly processed here:

       sm = s->init_data->smlists[DETECT_SM_LIST_PMATCH];
                while (sm != NULL) {
                    if (sm->type == DETECT_CONTENT &&
                            (((DetectContentData *)(sm->ctx))->flags &
                             (DETECT_CONTENT_DEPTH | DETECT_CONTENT_OFFSET))) {
                        s->flags |= SIG_FLAG_REQUIRE_PACKET;
                        break;
                    }
                    sm = sm->next;
                }

Could someone clarify why this behavior is applied? From the user point of view if tcp is applied to reassembled stream, then the tcp + “distance or offset” should also be applied to it as there is no tip that one of those keywords apply to packet payload (e.g. like for dsize keyword)

PS:
Explicitly putting tcp-stream instead of “tcp” make the rule to match to reassembled stream only

Thanks in advance for your answers!

The depth and offset modifiers are anchoring the match to the absolute start of a buffer, which in this case would be the TCP payload and/or stream data. The problem is that rule writers have for the longest time written rules that assume that a TCP payload == a protocol PDU, which is often true when looking at pcaps but is not a guarantee due to how TCP really works. It’s easy to send data 1 byte at a time in TCP (session splicing).

To address this we’re doing some tricks:

  • we apply these rules to packets directly, as these are most likely to contain what the rule writer intended
  • we apply these rules to the reassembled stream to avoid the session splicing case
  • protocols implement some logic to make suricata apply PDU boundaries when inspecting these blocks of reassembled data from (hopefully) the correct position

This is a bit of a best effort approach that is both inefficient (data is scanned twice) and not very exact. This is why for Suricata 7 and beyond we’re working on proper PDU support in the rule language:

https://redmine.openinfosecfoundation.org/issues/4174

The reason why this isn’t done for rules just using distance or within is that these are relative modifiers, so not anchored to the start of a pdu/packet, so we can apply them to just the reassembled data.