AppleScript to be added to Hazel, script will clear the current OCR layer and run OCR again

RACsMac · September 10, 2018, 2:31am

I have quite a few large databases in my DEVONthink Pro Office and needs some special attention.
When I initially imported the files the copies were of poor quality and many of the pages were skewed. All I can say is Thank God got MPU because after working 12 to 16 hours a day trying to get a handle on the nightmare, I so looked forward to my drive home so I could learn from David and Katie.
As I said earlier when I first imported the scanned documents into DEVONthink Pro Office I was unaware that the quality of the PDF would affect the OCR layer. I also was ignorant of the fact that if a page of a scanned document is sideways or slightly skewed the OCR possibly would not work correctly. When the dust settled, I had well over 85,000 files in 9 databases.
To combat the issue I was pulling out folders on to my desktop and then using the space bar to preview the file quickly….but then there was the issue of OCR layer.

I made this Marco I also used Karabiner Elements App mainly cause Brett Terpstra () said it was cool.

Is there an AppleScript I can plug into Hazel that will first clear the OCR layer and then reapply the OCR Layer

Hopefully, some of you super smart folks can help me out

Ashley · September 11, 2018, 10:08pm

I use PDFPro for this. Don’t recall who wrote the script, but probably someone from the MacSparky/Katie universe. Anyhow, this is very slow. Maybe you can adapt it to Devonthink to work behind the scenes?

tell application "PDFpenPro"
open theFile as alias

--remove OCR layer from the document
-- this only strips the OCR, doesn't impact "real text" PDFs.
activate application "PDFpenPro"
delay 2

tell application "System Events"
	-- This is the keyboard shortcut to remove the OCR layer
	keystroke "o" using {command down, option down, control down}
end tell

-- without this delay, testing the document will claim it doesn't need OCR
-- delay required for the "remove OCR layer" step to take effect
delay 2
-- does the document need to be OCR'd?
get the needs ocr of document 1
if result is true then
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	--In PDFpen, when no documents are open, window 1 is "Preferences"
	--If other documents are open, do not close the App.
	if name of window 1 is "Preferences" then
		tell application "PDFpenPro"
			quit
		end tell
	end if
else
	-- Scan Doc was previously OCR'd or is already a text type PDF.
	tell document 1
		close without saving
	end tell
	--In PDFpen, when no documents are open, window 1 is "Preferences"
	--If other documents are open, do not close the App.
	if name of window 1 is "Preferences" then
		tell application "PDFpenPro"
			quit
		end tell
	end if
end if
end tell
-- without this, sometimes it seems to kick off this same script with multiple matches at once
 delay 2

Mmandell · July 19, 2023, 2:54am

Does this still work on the new version of Mac?

Mine OCR script seems to be broken post upgrade of Mac OS to Ventura 13.4.1

ghui · September 17, 2023, 2:45pm

Hi, I’m getting a Syntax Error that: Expected “end” or “end tell” but found unknown token. Seems to be related to the step to remove the OCR layer. Anyone have an idea why this is no longer working?