How to create a dataset using http.host and http.uri

Hi,

I have a URL blocklist feed that i want to integrate with Suricata. Is it possible to create a unique dataset and a rule that can use both http.host and http.uri from that dataset?

Thanks.

Unfortunately no, there is no way AFAIK to implement malicious url detection using datasets, this is because we do not have a keyword that represents the host+uri together.

Can you provide several examples of http.host and uri you’d like to combine?

For example, i would like to create a single rule that could match all of this http.host and http.uri combination.

https://threatview.io/Downloads/URL-High-Confidence-Feed.txt

Maybe a python script:

f = open("rules.txt")
data = f.read().strip().split("\n") 
f.close()

for line in data:
  protocol,null,hostport,*url = line.split("/")
  uri = "/".join(url)
  host,*port = hostport.split(":")

  def makerule(host,port,uri):
    return 'alert http any %s -> any any (http.host;content:"%s";http.uri;content:"%s";)' % (port,host,uri)
 
  if port == []:
    if protocol == 'http':
      port = [str(80)]
    else:
      port = [str(443)]

  print("# Input",line)
  print(makerule(host,port[0],uri))
  print("")

Producing something like:

# Input https://23.94.240.207:8443/pixel.gif
alert http any 8443 -> any any (http.host;content:"23.94.240.207";http.uri;content:"pixel.gif";)

# Input https://23.94.240.207/cx
alert http any 443 -> any any (http.host;content:"23.94.240.207";http.uri;content:"cx";)

# Input https://43.137.8.159/cm
alert http any 443 -> any any (http.host;content:"43.137.8.159";http.uri;content:"cm";)

# Input https://43.153.117.9:4433/__utm.gif
alert http any 4433 -> any any (http.host;content:"43.153.117.9";http.uri;content:"__utm.gif";)

# Input https://6401f.samples.muzikcitysound.com/subscribeEvent
alert http any 443 -> any any (http.host;content:"6401f.samples.muzikcitysound.com";http.uri;content:"subscribeEvent";)

Thanks four your precious help.

Not a very expert in Python though but when running this script on the URL feed itself some results are this for example:

# Input http://yawyawvaryyaaaa.com/
alert http any 443 -> any any (http.host;content:"yawyawvaryyaaaa.com";http.uri;content:"";)

According to the script, this rule should be getting port 80 instead of 443.

Many thanks!

Just need to add a colon (:) in if protocol == 'http:':

Seems perfect! But anyway i remembered another problem with the SIG ID. Maybe that is a way to configure consecutive SIG IDs in each rule. But it will increase the complexivity of the script

Maybe Suricata should supply a http.url buffer for this use case…