Using Hazel and OCR to rename pdf scripts?

lbdm · May 8, 2025, 4:07pm

Thanks for this, I’ve had a look at the link you kindly shared and have been experimenting. I’ve given up on dates for now as I’d like to get the basics going first…

Taking my cue from this post here, I worked out that using pdftotext to pull the text from page 1 (post-OCR), and then using sed to pull individual lines, was probably my safest bet - so I’ve attempted to retrofit the code in this post. The pdftotext part works fine; however I cannot for love nor money get an AppleScript to pull the individual lines from the .txt file to tokenise them, even though the sed commands work fine in Terminal. Current end result: a load of files just being renamed ‘pdf’

The logs don’t shed any light, as per Hazel the rule is running successfully. Ideally I’d eventually like to make each line its own token to allow for formatting differences, but I need to be able to tokenise one first! The answer is certainly in my (total inability to) code - where am I going wrong?

set itemPath to quoted form of POSIX path of theFile
tell application "Finder" to set fileName to name of theFile
set scriptFrontPage to do shell script "/opt/homebrew/Cellar/xpdf/4.05/bin/pdftotext -raw -simple2 -f 1 -l 1 " & itemPath
set scriptTitle to do shell script "echo " & quoted form of scriptFrontPage & " | sed -n '1'p"
set scriptEpisode to do shell script "echo " & quoted form of scriptFrontPage & " | sed -n '2'p"
set scriptWriter to do shell script "echo " & quoted form of scriptFrontPage & " | sed -n '3,4'p"
set fileName to do shell script "echo " & quoted form of scriptTitle
return {hazelExportTokens:{fileName}}