Hazel rename file to content from within the PDF file

With Hazel I can apply rules to certain PDF files if they contain the text I specify.

I use this for example for Invoices.

What I want to do is create a nice folder with all invoices in order of the month they are for.

Currently I got it to rename the file based on date created (which is diffrent from the date I downloaded it), which works 80% of the time.

But sometimes the date created doesn’t work for renaming (for example when the invoice was created in the next month, but applies to the previous month). I would end up with 2 files having the same name, and have to manually correct this.

In the content of this file there is the correct number to use… Is there any way I can rename the file to some field that can be found in the content?

1 Like

I use a similar rule to rename and file invoices when they land in my Downloads folder, and then open them for printing.

I solved that problem with a command-line utility called pdftotext (part of the open-source XpdfReader), which I call via an Applescript. Here’s the rule:

And the script, embedded in the first line of the rule:

set itemPath to quoted form of POSIX path of theFile
tell application "Finder" to set fileName to name of theFile
set clientLine to do shell script "/usr/local/bin/pdftotext -raw " & itemPath & " - | grep 'Invoice For'"
set clientName to ((characters 13 thru -1 of clientLine) as string)
set clientName to do shell script "echo " & quoted form of clientName & " | sed -e 's/ //g'"
set fileName to do shell script "echo " & quoted form of fileName & " | sed -e 's/Creative_Q/" & clientName & "/g'"
return {hazelExportTokens:{fileName}}

I don’t know if there’s an easier way to do this, or one without script dependencies, but this has worked reliably for me.

1 Like

I use dates within the PDF and renamed the file based on the date format. David Sparks’ created a short video on how to do this. That is where I learned it.

The rule is something like Contents, contain match, select custom date, enter a name (I use Date matched), uncheck detect date automatically, find the date you want to identify and use the same format used in the file (you can also type slashes, commas, and spaces as needed), click ok.

Then on the rename make sure to select the “Date matched” that you created above and then whatever else you’d like to rename. Hope I’m clear on this and don’t confuse you. Look for that video, it is super helpful.

This is what I do when looking for a date in the document. I often precede the date variable with text to make sure I get the date I’m looking for.

For example, I might use:

Billing date: <<statement date>>

I then use the variable in the file name and adjust the output formatting to YYYY-MM-DD per my file naming conventions when dates are used.

Note that the input and output date formats for the variable can be different. This is quite powerful.

1 Like

Here’s something I am wondering if/how Hazel can do. In my work as a trial court judge, I often create multiple scheduling orders using Word and Excel to create a mail merge document as a PDF file. I then separate the individual orders by dragging the two pages in the order to the desktop. What I’d love Hazel to do is to “read” the contents, identify the case number in the order and then rename the file to match that case number. To be specific, near the top of each page in the caption (Party A vs. Party B) there will be a case number like 1906 PL 501. The next PDF might be 1906 PL 609. As I drop the new PDF file containing only 1906 PL 501 into the Hazel-monitored folder, I’d want Hazel to rename the file to 1906 PL 501 (or whatever the case number happens to be).

Something like this could be useful for anyone who might want to rename PDF files based on say an author’s name for academic articles etc.

I’ve hunted around but can’t seem to find any examples. The AppleScript stokini uses looks like it could be a start, but I could write everything I know about AppleScript inside a matchbook cover. :smile:

Ideally I’d like to find a resource online that can show me how to do it so I learn something. Then if I run into problems I could ask questions on this site.

Many thanks in advance!

I just saw a post in another thread that mentions Hazel fetching info out of a PDF file, so it apparently can be done. I’ve asked that post author what his rule looks like. I’ll share here when I find out.

That should be possible with Hazels built in “match patterns” rule. No need for a script I think. You find more information about it here: https://www.noodlesoft.com/manual/hazel/attributes-actions/attribute-reference/using-custom-attributes/

Edit: to match “1906 PL 501” it should look something like this:

“TextToken” should then appear in the rename section.

2 Likes

This looks fantastic. Thank you!

This worked perfectly! Thank you again.

1 Like