Automatically OCRing Documents with Hazel and PDFPen Pro

Take a look at this video.

1 Like

Not sure there’s necessarily anything as a direct comparison, but if you search for “prizmo vs pdfpenpro” in your search engine of choice, you’ll find various top 10 app comparisons and some comparison of specific features within particular app reviews.

In terms of PDF Pen Pro recently adding OCR automation features, they have had the option to automate OCR built in there for many years - I can’t remember how long I’ve been using it and Hazel to automate OCR (and other PDF stuff), but easily 5 or 6 years. What Smile did add in v10 this last year, was an in-app batch OCR capability to make things easier.

A familiar sounding voice may have recorded a video about it…

Hope that helps.

1 Like

“In-app batch”

Perhaps that’s what I remember reading. When I typically OCR, I have many PDFs to do, and need a simple iteration mechanism.

Just for the sake of alternatives: I use AbbyFine Reader to automate OCR and especially important for me: to split landscape PDFs (of books) with two pages per landscape page into two portrait pages and OCR the file in one setup. Super useful!

1 Like

Did you ever get an answer on this. I did not know that DevonThink Office Pro could do OCR. I understand that they are almost about to release a new version.

I am just little nervous putting my docs in a container.

Unfortunately it does not really work either. ALWAYS hangs after about 10-20 files.

MACSparky. Can you call your friends at Smile and get them on the fix. It has been unreliable, I mean, DOES NOT WORK, for over a year.

I have reported it multiple times. No Joy.

Have you reported it as an issue to them? Their help desk has been excellent when I’ve had TextExpander issues. That should be the way to raise the issue, not asking someone else to do it on your behalf. That is because it could yet be something very specific to your setup, which could explain why the issue might have persisted for an extended period.

I’m still on version 9 I think and using a script driven approach. Never had any issues, but I’ve a different approach.

1 Like

Guten Tag! Could you point me to the AppleScript to use Abby to automate OCR? I’m using hazel to open PDFPenPro, and would rather use FineReader. This app is on sale now for “Black Friday” and seeing that encouraged me to look at the automator’s forum! Many thanks.

For what it’s worth, there’s a discussion thread on MacScripter for this that also highlights a version difference between app store and direct from Abbyy.

Otherwise, if you search for abbyy finereader ocr applescript in your favourite search engine, I suspect you’ll find several similar examples in the top results.

Hope that helps.

Thanks, Stephen. I did that search but didn’t find a recent script posted. The link you shared looks to solve that problem! Regards!

I know this thread goes back a few years, but I’m having similar issues. I have been using Katie’s OCR script for PDFPenPro with Hazel 4 without issue, and with Hazel 5 since release without issues. I’m on macOS 12.4, Hazel 5.1.2, and PDFPenPro 12.2.3. I suddenly started receiving errors like this, constantly, on every scanned PDF, regardless of source and including old PDFs that were processed successfully in the past, but I manually cleared the OCR layer and tried them again for testing.

I haven’t changed the Hazel rules, config, or the AppleScript. I have tried the newer variations of the script with additional checks to see if PDFs need to be OCR’d, and I have even completed a reinstall of macOS, but the issue persists.

Is anyone else seeing similar issues or have a suggestion to resolve?

My solution to add to the throng.

  • brew install ocrmypdf

  • hazel rule running this bash/zsh script.

if ! grep Font "$1"
	ocrmypdf -l eng "$1" "$1" 
    sleep 5

2022-06-23 at 16.21

1 Like

Thanks, I appreciate it. However I have found that I get better results from PDFPenPro than from Tesseract. That’s a fallback option at least.