Has anyone made progress with parsing PDFs or updating them with eg bookmarks programmatically?
It’s something I want to bite off one day.
I would ask this question for iOS as well.
Has anyone made progress with parsing PDFs or updating them with eg bookmarks programmatically?
It’s something I want to bite off one day.
I would ask this question for iOS as well.
I’d start with python and PyPDF2. Whatever you write should work on Mac & in Pythonista on iOS with little modification.
I’ve only just explored it with this Alfred workflow but there’s a lot more it can do. See https://www.alfredforum.com/topic/9276-alfred-pdf-tools-–-optimize-and-manipulate-pdf-files/ for examples.
Also the open source version of reportlab if you want to generate documents.
Thanks @dfay. I would consider parsing a PDF but putting it back together, with edits would probably be a stretch. So a tool that might take the fiddliness out of it is a good thing.
And PyPDF2 is something I am aware of. Will need to experiment.
It is not programming but I use a combination of Hazel and AbbyFineReader to accomplish this. It is not working all the time in any cases, but it works not bad.
AbbyFineReader is recognizing the structure of PDFs and is combining it to a table of content and index. So it is not really parsing PDFs by syntax but by the visual structure (font size etc.). I just reread your initial post and think I first misunderstood you: To grab the syntax of sentences is not possible with abby fine reader as far as I know…
Well my first use case would say “there’s a 24-point piece of text on this page; Make a bookmark for this page, optionally with the text.”
If I achieved that it’d be a bit of a hack. Ideally I’d add the bookmarks at source but I can’t do that.