Programmatically hacking PDF files


#1

Has anyone made progress with parsing PDFs or updating them with eg bookmarks programmatically?

It’s something I want to bite off one day.

I would ask this question for iOS as well.


#2

I’d start with python and PyPDF2. Whatever you write should work on Mac & in Pythonista on iOS with little modification.

I’ve only just explored it with this Alfred workflow but there’s a lot more it can do. See https://www.alfredforum.com/topic/9276-alfred-pdf-tools-–-optimize-and-manipulate-pdf-files/ for examples.

Also the open source version of reportlab if you want to generate documents.


#3

Thanks @dfay. I would consider parsing a PDF but putting it back together, with edits would probably be a stretch. So a tool that might take the fiddliness out of it is a good thing.

And PyPDF2 is something I am aware of. Will need to experiment.


#4

It is not programming but I use a combination of Hazel and AbbyFineReader to accomplish this. It is not working all the time in any cases, but it works not bad.


#5

Thanks @E_Thelonius! Can you give some idea of what you manage to achieve with that?


#6

AbbyFineReader is recognizing the structure of PDFs and is combining it to a table of content and index. So it is not really parsing PDFs by syntax but by the visual structure (font size etc.). I just reread your initial post and think I first misunderstood you: To grab the syntax of sentences is not possible with abby fine reader as far as I know…


#7

Well my first use case would say “there’s a 24-point piece of text on this page; Make a bookmark for this page, optionally with the text.”

If I achieved that it’d be a bit of a hack. Ideally I’d add the bookmarks at source but I can’t do that.