Using Hazel to sort .webarchive files based on originating source URL?

Is there a way to get Hazel to filter on the source URL of a saved .webarchive file?

The obvious way is to use either the Source/URL Address or Where from conditions, but for some reason, .webarchive files don’t seem to have the originating URL in these metadata fields. I also confirmed using mdls that the URL isn’t part of that metadata.

Mysteriously though, if I drag saved .webarchive files into the “Keep It” app, it correctly identifies the originating URL where that .webarchive came from, despite there being no kMDItemWhereFroms metadata for that file:

So it seems that even though there’s no Spotlight metadata for the Source URL, that metadata is preserved somewhere in the .webarchive file. Is there a way to get Hazel to find and filter on this information?

A .webarchive file is simply a plist with a whole bunch of data, including a key called WebResourceURL that contains the originating URL. So you can read the webarchive file as a plist and extract the value of that key. You can do this using AppleScript, JavaScript, or shell script, whichever you prefer

As a very simple example, here’s an AppleScript I used to match a webarchive against the URL of this forum and then in the action section (obscured in my screenshot) I just applied a label color:

See the Hazel documentation for more that you can do with scripts in rule conditions. As I said, the above is just a very simple proof of concept example.

1 Like