Automatically OCRing Documents with Hazel and PDFPen Pro

Thanks for that info.

I haven’t logged the issue with Smile. My bad. I would do it the next time it happens. Thanks for your suggestion.

1 Like

I’ve used a slightly-modified version of the script that only does the OCR if the PDF requires it (skips if the PDF already has a text layer) and doesn’t close PDFPenPro if another document is open:

tell application "PDFpenPro"
    open theFile as alias
-- does the document need to be OCR'd?
get the needs ocr of document 1
if result is true then
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	--In PDFpen, when no documents are open, window 1 is "Preferences"
	--If other documents are open, do not close the App.
	if name of window 1 is "Preferences" then
		tell application "PDFpenPro"
			quit
			end tell
   		end if
	else
		-- Scan Doc was previously OCR'd or is already a text type PDF.
		tell document 1
			close without saving
		end tell
		--In PDFpen, when no documents are open, window 1 is "Preferences"
 		--If other documents are open, do not close the App.
		if name of window 1 is "Preferences" then
			tell application "PDFpenPro"
				quit
			end tell
		end if
	end if
end tell
15 Likes

I offload this sort of processing to PDFPen Pro on my mac mini server and so always close the app when it is finished. When I open it on my Macbook Pro, it remembers what PDFs I was currently working with, so I don’t think the checking of how many files are open would work correctly for me if I was using it there. I figure it would almost always stay open.

Maybe checking for if PDFPen Pro is active prior to opening the file to OCR might be more reliable for those who have files automatically reopen in the app?

The checking if OCRing is necessary is a really nice enhancement. Whilst I think I only send non-OCR’d PDFs to be OCR’d this would negate any need to check on my part and I’m all for that.

1 Like

@jmreekes and @RosemaryOrchard, is there an automation like this one that will use DevonThink Pro Office to do the OCRing instead of PDFpen Pro?

1 Like

I used AppleScript to script the UI in Acrobat X in order to OCR a file.

tell application "Adobe Acrobat Pro"
activate
set theFile to "Macintosh HD:Users:Jim:Documents:img0005.pdf"
open theFile as alias
delay 2
activate
tell application "System Events"
	tell application process "Acrobat"
		-- run action to OCR
		click menu item "OCR_file" of menu 1 of menu item "Action Wizard" of menu 1 of menu bar item "File" of menu bar 1
		delay
		-- perform the OCR
		click button "Next" of window 1
		
		set OCRdone to false
		repeat while OCRdone is false
			try
				set OCRdone to true
				click button "Close" of window 1 -- will create error unless OCR is done
			on error
				set OCRdone to false
			end try
		end repeat
	end tell
	keystroke "w" using command down
end tell
quit
end tell
6 Likes

Hey Folks,

I wanted to mention that Prizmo with the Pro Pack (also available with a SetApp sub) has hooks that Automator can access. I have a similar setup to the ones described above: Hazel watches my Downloads folder, looking for image-only PDFs, and auto OCRs the files. This workflow requires no scripting–perhaps of interest to folks out there, like me, who can’t code, yet!

5 Likes

Thanks for sharing this. I just set this up and tried it out on a couple documents, and the documents I got back looked to be the OCR layer only (i.e., the visible appearance of the document was changed). Do you know if there is a way to get a “normal” OCR’d document via Prizmo and Automator, with the visible part of the PDF appearing unchanged, but containing the invisible OCR layer?

EDIT: serves me right for not digging into this more before posting. There is a setting in the Automator workflow to use "PDF (Image + Searchable Text)

1 Like

Does anyone have an idea how something similar could be achieved (with Hazel) for Business Cards, where ideally the scan is ocr’ed and then the contact information is added to my contacts and the scan is archived?

Many Thanks.

Sometimes I face the same issue. Next time it happens to me, I will also report it to Smile.

Hi David. I’m experienced with Hazel and with OCRing via ScanSnap and PDFpenPro. I’m new to Automator and Prizmo. How do I “hook” Automator into Prizmo? Thanks, Joe

1 Like

Thanks for sharing this. I do want to explore this as an alternative approach to OCRing PDFs. E

As a side note, recent versions of PDF Pen Pro have some ocr automation built in. I haven’t tried it yet though.

Has anyone seen or published comparisons between Prizmo and PDFPen Pro?

Take a look at this video.

1 Like

Not sure there’s necessarily anything as a direct comparison, but if you search for “prizmo vs pdfpenpro” in your search engine of choice, you’ll find various top 10 app comparisons and some comparison of specific features within particular app reviews.

In terms of PDF Pen Pro recently adding OCR automation features, they have had the option to automate OCR built in there for many years - I can’t remember how long I’ve been using it and Hazel to automate OCR (and other PDF stuff), but easily 5 or 6 years. What Smile did add in v10 this last year, was an in-app batch OCR capability to make things easier.

A familiar sounding voice may have recorded a video about it…

Hope that helps.

1 Like

“In-app batch”

Perhaps that’s what I remember reading. When I typically OCR, I have many PDFs to do, and need a simple iteration mechanism.

Just for the sake of alternatives: I use AbbyFine Reader to automate OCR and especially important for me: to split landscape PDFs (of books) with two pages per landscape page into two portrait pages and OCR the file in one setup. Super useful!

1 Like

Did you ever get an answer on this. I did not know that DevonThink Office Pro could do OCR. I understand that they are almost about to release a new version.

I am just little nervous putting my docs in a container.

Unfortunately it does not really work either. ALWAYS hangs after about 10-20 files.

MACSparky. Can you call your friends at Smile and get them on the fix. It has been unreliable, I mean, DOES NOT WORK, for over a year.

I have reported it multiple times. No Joy.

Have you reported it as an issue to them? Their help desk has been excellent when I’ve had TextExpander issues. That should be the way to raise the issue, not asking someone else to do it on your behalf. That is because it could yet be something very specific to your setup, which could explain why the issue might have persisted for an extended period.

I’m still on version 9 I think and using a script driven approach. Never had any issues, but I’ve a different approach.

1 Like