AppleScript to be added to Hazel, script will clear the current OCR layer and run OCR again

I have quite a few large databases in my DEVONthink Pro Office and needs some special attention.
When I initially imported the files the copies were of poor quality and many of the pages were skewed. All I can say is Thank God got MPU because after working 12 to 16 hours a day trying to get a handle on the nightmare, I so looked forward to my drive home so I could learn from David and Katie.
As I said earlier when I first imported the scanned documents into DEVONthink Pro Office I was unaware that the quality of the PDF would affect the OCR layer. I also was ignorant of the fact that if a page of a scanned document is sideways or slightly skewed the OCR possibly would not work correctly. When the dust settled, I had well over 85,000 files in 9 databases.
To combat the issue I was pulling out folders on to my desktop and then using the space bar to preview the file quickly….but then there was the issue of OCR layer.

I made this Marco I also used Karabiner Elements App mainly cause Brett Terpstra (:beers:) said it was cool.

Is there an AppleScript I can plug into Hazel that will first clear the OCR layer and then reapply the OCR Layer

Hopefully, some of you super smart folks can help me out

I use PDFPro for this. Don’t recall who wrote the script, but probably someone from the MacSparky/Katie universe. Anyhow, this is very slow. Maybe you can adapt it to Devonthink to work behind the scenes?

tell application "PDFpenPro"
open theFile as alias

--remove OCR layer from the document
-- this only strips the OCR, doesn't impact "real text" PDFs.
activate application "PDFpenPro"
delay 2

tell application "System Events"
	-- This is the keyboard shortcut to remove the OCR layer
	keystroke "o" using {command down, option down, control down}
end tell

-- without this delay, testing the document will claim it doesn't need OCR
-- delay required for the "remove OCR layer" step to take effect
delay 2
-- does the document need to be OCR'd?
get the needs ocr of document 1
if result is true then
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	--In PDFpen, when no documents are open, window 1 is "Preferences"
	--If other documents are open, do not close the App.
	if name of window 1 is "Preferences" then
		tell application "PDFpenPro"
			quit
		end tell
	end if
else
	-- Scan Doc was previously OCR'd or is already a text type PDF.
	tell document 1
		close without saving
	end tell
	--In PDFpen, when no documents are open, window 1 is "Preferences"
	--If other documents are open, do not close the App.
	if name of window 1 is "Preferences" then
		tell application "PDFpenPro"
			quit
		end tell
	end if
end if
end tell
-- without this, sometimes it seems to kick off this same script with multiple matches at once
 delay 2
1 Like