Programmatically hacking PDF files

Martin_Packer · February 2, 2019, 9:05am

Has anyone made progress with parsing PDFs or updating them with eg bookmarks programmatically?

It’s something I want to bite off one day.

I would ask this question for iOS as well.

dfay · February 2, 2019, 10:16am

I’d start with python and PyPDF2. Whatever you write should work on Mac & in Pythonista on iOS with little modification.

I’ve only just explored it with this Alfred workflow but there’s a lot more it can do. See https://www.alfredforum.com/topic/9276-alfred-pdf-tools-–-optimize-and-manipulate-pdf-files/ for examples.

Also the open source version of reportlab if you want to generate documents.

Martin_Packer · February 2, 2019, 2:45pm

Thanks @dfay. I would consider parsing a PDF but putting it back together, with edits would probably be a stretch. So a tool that might take the fiddliness out of it is a good thing.

And PyPDF2 is something I am aware of. Will need to experiment.

E_Thelonius · February 7, 2019, 10:11am

It is not programming but I use a combination of Hazel and AbbyFineReader to accomplish this. It is not working all the time in any cases, but it works not bad.

Martin_Packer · February 7, 2019, 11:13am

Thanks @E_Thelonius! Can you give some idea of what you manage to achieve with that?

E_Thelonius · February 7, 2019, 3:23pm

AbbyFineReader is recognizing the structure of PDFs and is combining it to a table of content and index. So it is not really parsing PDFs by syntax but by the visual structure (font size etc.). I just reread your initial post and think I first misunderstood you: To grab the syntax of sentences is not possible with abby fine reader as far as I know…

Martin_Packer · February 8, 2019, 8:12am

Well my first use case would say “there’s a 24-point piece of text on this page; Make a bookmark for this page, optionally with the text.”

If I achieved that it’d be a bit of a hack. Ideally I’d add the bookmarks at source but I can’t do that.