I have quite a few large databases in my DEVONthink Pro Office and needs some special attention.
When I initially imported the files the copies were of poor quality and many of the pages were skewed. All I can say is Thank God got MPU because after working 12 to 16 hours a day trying to get a handle on the nightmare, I so looked forward to my drive home so I could learn from David and Katie.
As I said earlier when I first imported the scanned documents into DEVONthink Pro Office I was unaware that the quality of the PDF would affect the OCR layer. I also was ignorant of the fact that if a page of a scanned document is sideways or slightly skewed the OCR possibly would not work correctly. When the dust settled, I had well over 85,000 files in 9 databases.
To combat the issue I was pulling out folders on to my desktop and then using the space bar to preview the file quickly….but then there was the issue of OCR layer.
I use PDFPro for this. Don’t recall who wrote the script, but probably someone from the MacSparky/Katie universe. Anyhow, this is very slow. Maybe you can adapt it to Devonthink to work behind the scenes?
tell application "PDFpenPro"
open theFile as alias
--remove OCR layer from the document
-- this only strips the OCR, doesn't impact "real text" PDFs.
activate application "PDFpenPro"
delay 2
tell application "System Events"
-- This is the keyboard shortcut to remove the OCR layer
keystroke "o" using {command down, option down, control down}
end tell
-- without this delay, testing the document will claim it doesn't need OCR
-- delay required for the "remove OCR layer" step to take effect
delay 2
-- does the document need to be OCR'd?
get the needs ocr of document 1
if result is true then
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
--In PDFpen, when no documents are open, window 1 is "Preferences"
--If other documents are open, do not close the App.
if name of window 1 is "Preferences" then
tell application "PDFpenPro"
quit
end tell
end if
else
-- Scan Doc was previously OCR'd or is already a text type PDF.
tell document 1
close without saving
end tell
--In PDFpen, when no documents are open, window 1 is "Preferences"
--If other documents are open, do not close the App.
if name of window 1 is "Preferences" then
tell application "PDFpenPro"
quit
end tell
end if
end if
end tell
-- without this, sometimes it seems to kick off this same script with multiple matches at once
delay 2