MPM bypass commit from 2012, and the purpose of SPM & MPM

dominusmi · October 2, 2020, 9:39am

While looking around what exactly happens for rule content matching, I came upon this commit:

As you can see, it comments out some lines. What would happen without this comment is that, when there’s an MPM match, it directly goes to the match section. As of right now, there needs to be an MPM match as well as an SPM match, which (I think) always happens since it’s the same content, however it means scanning twice.

In addition to this commit which was apparently meant to be temporary and has never been changed since, is that I’ve not been able to find the reasoning and roles behind MPM and SPM. All I’ve managed to understand is that they stand for Multi and Single Pattern Matching, and that MPM is done first (I guessed as prefilter?) and SPM second.
SPM then does more in-depth analysis, for instance it is what handles the distance, within, etc. keywords from the rule.

Is there any specific documentation or explanation? It would appear, especially since hyperscan has been introduced, that the existence of the SPM is not as necessary as it might’ve been beforehand, and so I’m trying to understand if there’s more of a legacy related structure, or if there’s something I’m missing (very likely, I would add)

vjulien · October 6, 2020, 7:54pm

I don’t remember the specifics of that commit. 2012 is a long time ago

In general the MPM takes a lot of patterns and inspects a payload against them all at once. Then for the rules that have a MPM match, we run SPM to validate the match. This may be unnecessary in some case, but in many cases it is necessary as the rule language is much more expressive than the MPM can handle. For example there may be relations between the positions of 2 or more patterns, byte operations, pcre, etc.

I think it could be interesting to see if we can reenable this potential optimization for the simple patterns that the MPM can fully validate.