Create OCR Layer in PDF with Hazel - FREEWARE (solved)

FlohGro · April 23, 2019, 1:09pm

Hi Guys,
I found some topics about Hazel and OCR but it seems like most of you are using PDFpen Pro to get the OCR layer.
I wanted to ask if someone has a free tool which can accomplish the same. I dont need a new PDF editor, because I have PDF expert and it fits for everything I want to do, exept the OCR thing. I already mailed them but as far as i know this feature is not available.

What I want is (maybe a terminal tool - I think there is something available with brew) a tool which can apply the OCR layer in the background.

EDIT: solution in last reply

Isaac · April 23, 2019, 5:33pm

I tried using Tesseract for a workflow when I was trying to replicate Evernote’s ability to scan images for words. I switched to a Devonthink Pro workflow because Evernote got creepy. It worked okay, but it was not the most accurate tool for my task.

Kaitlin · April 23, 2019, 9:00pm

Not free, exactly, but I played around with Prizmo, which is included in SetApp, for this. I did get something working at the time (this would have been about a year ago now) as the app has an Automator action. However it took FOREVER to run and I never took the time to look into it further.

PDFPen is included in SetApp now too, though, which may mean the above is irrelevant—being currently Mac-less I haven’t looked at it again.

tbrown313 · April 25, 2019, 6:45pm

Take a look at ocrmypdf. I have used it quite a bit for documents I scan with my phone. Free and capable, with good documentation.

I used to use DevonThink, but have moved away from it because this seems to work nearly as well and is more convenient for what I do.

FlohGro · May 8, 2019, 5:07pm

For anyone interested,
thanks to @tbrown313, I installed OCRmyPDF with homebrew (HERE is the instruction).
to automatically scan the incoming PDFs I created the following hazel rule:

the embedded script is quite simple:

ocrmypdf -l deu+eng "$1" "$1" --skip-text

the --skip-text jumps over every page which already contains text (scans from my iPhone already contain a OCR layer).
you can change the languages (in my case German and English) according to your personal needs.

jeff_c · April 14, 2021, 9:46pm

Is anyone still using this? I can’t seem to get the embedded script FlohGro typed up to work. The log doesn’t give me anything useful: Shellscript exited with non-successful status code: 127.

Anyone have an idea for me?

FlohGro · April 20, 2021, 10:10pm

Did you install OCRmyPDF?

jeff_c · April 21, 2021, 4:49pm

Yes! It works great as a command in the terminal, just not as a script in Hazel. It seems like (the error) has something to do with the file referencing “$1” “$1.” Is it still working for you?

FlohGro · April 21, 2021, 5:09pm

Ok… unfortunately I’m not using the script anymore (I let DEVONthink OCR everything)

JaxkImari · April 27, 2021, 2:28pm

Why is “$1” there twice? Also, shouldn’t there be a named output file at the end, after “skip-text”?

FlohGro · April 27, 2021, 4:04pm

If you want to add a OCR layer the command:

ocrmypdf Input.pdf Output.pdf

is referenced in the „cookbook“ of ocrmypdf

That’s exactly what the double „$1“ is doing - in my usecase I didn’t want to receive another file and instead overwrite the current one.

You can of course modify this until it fits your need.

As I said, this worked for me until I switched to DEVONthink

swissbird · October 22, 2022, 2:03pm

Thanks for this script, for use together with Hazel is the best option for me. So I did install OCRmyPDF and tesseract as well. But I get anytime an error in the Skripteditor for the part “l deu” (for me as well german and english are relevant, most german).

What could be the problem here? I’m at a loss.

FlohGro · October 22, 2022, 5:57pm

Can you post the error? Did you copy the command from above including the dash „-“?