I found some topics about Hazel and OCR but it seems like most of you are using PDFpen Pro to get the OCR layer.
I wanted to ask if someone has a free tool which can accomplish the same. I dont need a new PDF editor, because I have PDF expert and it fits for everything I want to do, exept the OCR thing. I already mailed them but as far as i know this feature is not available.
What I want is (maybe a terminal tool - I think there is something available with brew) a tool which can apply the OCR layer in the background.
EDIT: solution in last reply
I tried using Tesseract for a workflow when I was trying to replicate Evernote’s ability to scan images for words. I switched to a Devonthink Pro workflow because Evernote got creepy. It worked okay, but it was not the most accurate tool for my task.
Not free, exactly, but I played around with Prizmo, which is included in SetApp, for this. I did get something working at the time (this would have been about a year ago now) as the app has an Automator action. However it took FOREVER to run and I never took the time to look into it further.
PDFPen is included in SetApp now too, though, which may mean the above is irrelevant—being currently Mac-less I haven’t looked at it again.
Take a look at ocrmypdf. I have used it quite a bit for documents I scan with my phone. Free and capable, with good documentation.
I used to use DevonThink, but have moved away from it because this seems to work nearly as well and is more convenient for what I do.
For anyone interested,
thanks to @tbrown313, I installed OCRmyPDF with homebrew (HERE is the instruction).
to automatically scan the incoming PDFs I created the following hazel rule:
the embedded script is quite simple:
ocrmypdf -l deu+eng "$1" "$1" --skip-text
the --skip-text jumps over every page which already contains text (scans from my iPhone already contain a OCR layer).
you can change the languages (in my case German and English) according to your personal needs.
Is anyone still using this? I can’t seem to get the embedded script FlohGro typed up to work. The log doesn’t give me anything useful: Shellscript exited with non-successful status code: 127.
Anyone have an idea for me?
Did you install OCRmyPDF?
Yes! It works great as a command in the terminal, just not as a script in Hazel. It seems like (the error) has something to do with the file referencing “$1” “$1.” Is it still working for you?
Ok… unfortunately I’m not using the script anymore (I let DEVONthink OCR everything)
Why is “$1” there twice? Also, shouldn’t there be a named output file at the end, after “skip-text”?
If you want to add a OCR layer the command:
ocrmypdf Input.pdf Output.pdf
is referenced in the „cookbook“ of ocrmypdf
That’s exactly what the double „$1“ is doing - in my usecase I didn’t want to receive another file and instead overwrite the current one.
You can of course modify this until it fits your need.
As I said, this worked for me until I switched to DEVONthink