How to create a dataset using and http.uri


I have a URL blocklist feed that i want to integrate with Suricata. Is it possible to create a unique dataset and a rule that can use both and http.uri from that dataset?


Unfortunately no, there is no way AFAIK to implement malicious url detection using datasets, this is because we do not have a keyword that represents the host+uri together.

Can you provide several examples of and uri you’d like to combine?

For example, i would like to create a single rule that could match all of this and http.uri combination.

Maybe a python script:

f = open("rules.txt")
data ="\n") 

for line in data:
  protocol,null,hostport,*url = line.split("/")
  uri = "/".join(url)
  host,*port = hostport.split(":")

  def makerule(host,port,uri):
    return 'alert http any %s -> any any (;content:"%s";http.uri;content:"%s";)' % (port,host,uri)
  if port == []:
    if protocol == 'http':
      port = [str(80)]
      port = [str(443)]

  print("# Input",line)

Producing something like:

# Input
alert http any 8443 -> any any (;content:"";http.uri;content:"pixel.gif";)

# Input
alert http any 443 -> any any (;content:"";http.uri;content:"cx";)

# Input
alert http any 443 -> any any (;content:"";http.uri;content:"cm";)

# Input
alert http any 4433 -> any any (;content:"";http.uri;content:"__utm.gif";)

# Input
alert http any 443 -> any any (;content:"";http.uri;content:"subscribeEvent";)

Thanks four your precious help.

Not a very expert in Python though but when running this script on the URL feed itself some results are this for example:

# Input
alert http any 443 -> any any (;content:"";http.uri;content:"";)

According to the script, this rule should be getting port 80 instead of 443.

Many thanks!

Just need to add a colon (:) in if protocol == 'http:':

Seems perfect! But anyway i remembered another problem with the SIG ID. Maybe that is a way to configure consecutive SIG IDs in each rule. But it will increase the complexivity of the script

Maybe Suricata should supply a http.url buffer for this use case…