Automate PDF Outline

So, I’m trying to figure out if my attempts at learning various automation stuff can help out with another geeky area of my life. (And I’m sure it would come in handy for a bunch of other documents too.)

I play a Pen and Paper RPG (played lots on my younger days, but now I play seldom and focus on only one game). For some reason which is completely beyond me, when they created the Outline of the PDF of the main book it was made a bit too shallow.

That is, a huge part of the book is descriptions of Charms (sort of like spells) which are grouped according to Ability (of which there are 25). Instead of creating a Outline with Outline Items for each Ability with an underlying level with Outline Items for each Charm they stopped at the Ability Level.

There are hundreds of Charms in the book, and you often reference this stuff when playing.

So I figured it would be neat to enhance the Outline with this missing Outline-Level.

Each Charm has a name, and this name is written in the same font/color/size combo and this combo is unique. So pretty much what I would like to do is have a script that adds an Outline Item with the Title being the name of the Charm. Ideally it would nest this Outline Item within the Ability Outline Item that covers it - but that’s not a big deal since going through the Outline and re-organizing them is not anywhere close to the workload of manually creating the 500 or so Outline Items.

I have PDF Expert and PDF Pen Pro. I assume this would be doable with some form of Apple Script? But I’m really not sure how I would go about this. :sweat_smile: If anyone has any similar examples that I could look at for inspiration that would be awesome.

I don’t think there’s anything that can automatically parse a PDF. The format - if you throw it into a text editor, as I have - is a little opaque.

Having said that, text is clearly distinguishable. It’d making a good weekend project for someone.

It’s no custom automation, but there’s an app called PDFOutliner on the Mac App Store that can auto-generate a Table of Contents based on fonts and styles.

The result depends quite heavily on the quality of the input, but as I understand it, your PDF is digitally created and no low quality scan, so it should work fine.

The app hasn’t been updated in years and years, but the developer still seems to be around.

3 Likes

Thank you; I could use that. We machine generate very long documents with lots of similarly-styled headings. Navigating that would be much easier with an automatically generated TOC with live links.

Thanks for the tip on PDFOutliner! I bought it and tried. Unfortunately, it didn’t work at all for the PDF in question. There are too many differences I think for it to meaningfully set identify the right ones automatically, and the “select a type of Text and get it to create an Outline based on the same combination” feature it has includes way too much of the other text in the PDF.

BUT it worked great for some other PDFs that are more normal work-related stuff without an Outline, so at least I got some value out of it! …and back to the drawing board for the fun and games Goal! :smiley:

I also tried PDFOutliner and its companion “PDF Splitter” app.

I think they’ll be useful for my needs but the developer needs to return to developing them as they are quirky and, I would say, incomplete.

1 Like