Create OCR Layer in PDF with Hazel - FREEWARE (solved)

Hi Guys,
I found some topics about Hazel and OCR but it seems like most of you are using PDFpen Pro to get the OCR layer.
I wanted to ask if someone has a free tool which can accomplish the same. I dont need a new PDF editor, because I have PDF expert and it fits for everything I want to do, exept the OCR thing. I already mailed them but as far as i know this feature is not available.

What I want is (maybe a terminal tool - I think there is something available with brew) a tool which can apply the OCR layer in the background.

EDIT: solution in last reply

2 Likes

I tried using Tesseract for a workflow when I was trying to replicate Evernote’s ability to scan images for words. I switched to a Devonthink Pro workflow because Evernote got creepy. It worked okay, but it was not the most accurate tool for my task.

1 Like

Not free, exactly, but I played around with Prizmo, which is included in SetApp, for this. I did get something working at the time (this would have been about a year ago now) as the app has an Automator action. However it took FOREVER to run and I never took the time to look into it further.

PDFPen is included in SetApp now too, though, which may mean the above is irrelevant—being currently Mac-less I haven’t looked at it again.

2 Likes

Take a look at ocrmypdf. I have used it quite a bit for documents I scan with my phone. Free and capable, with good documentation.

I used to use DevonThink, but have moved away from it because this seems to work nearly as well and is more convenient for what I do.

1 Like

For anyone interested,
thanks to @tbrown313, I installed OCRmyPDF with homebrew (HERE is the instruction).
to automatically scan the incoming PDFs I created the following hazel rule:

the embedded script is quite simple:

ocrmypdf -l deu+eng "$1" "$1" --skip-text

the --skip-text jumps over every page which already contains text (scans from my iPhone already contain a OCR layer).
you can change the languages (in my case German and English) according to your personal needs.

7 Likes

Is anyone still using this? I can’t seem to get the embedded script FlohGro typed up to work. The log doesn’t give me anything useful: Shellscript exited with non-successful status code: 127.

Anyone have an idea for me?

Did you install OCRmyPDF?

Yes! It works great as a command in the terminal, just not as a script in Hazel. It seems like (the error) has something to do with the file referencing “$1” “$1.” Is it still working for you?

Ok… unfortunately I’m not using the script anymore (I let DEVONthink OCR everything)

Why is “$1” there twice? Also, shouldn’t there be a named output file at the end, after “skip-text”?

If you want to add a OCR layer the command:

ocrmypdf Input.pdf Output.pdf

is referenced in the „cookbook“ of ocrmypdf

That’s exactly what the double „$1“ is doing - in my usecase I didn’t want to receive another file and instead overwrite the current one.

You can of course modify this until it fits your need.

As I said, this worked for me until I switched to DEVONthink