Split PDFs in a more automated way


#1

I do admin work and have to split up a multi page PDF files at least 10 times a day and automating would be a huge time and tedium benefit.

I receive real estate contracts and have to split them into separate components with new file names. For example break down a 10 page PDF contract into separate files of 2 pages, 4 pages, 3 pages, and 1 page. It varies how many pages in each component. One contract could be as described above, the next may require splitting into 6,3,4,1,2,1 pages.

My current workflow is open in preview, highlight the relevant pages and drag/drop them out to Finder, then rename the files. Repeating until I’ve broken the contract down. Tedious and time consuming, but I’m stumped in how to approach this problem.

Any suggestions? Thanks!


#2

Take a look at PDFtk. It’s a command line tool for working with PDFs.

You could combine this with any number of things such as Mac Services, Hazel, Alfred, Keyboard Maestro, AppleScript, etc. to construct something to work exactly how you want.

It should be possible for example to have a script that runs with a set of page numbers, that is then used to split all the pages out of the original PDF snd create a new one using the pages specified in the order specified. It might take a little while to experiment and put together depending upon your proficiency, but it should be absolutely achievable.

I mainly use this tool on Windows as it’s ususally work where I have to do this sort of manipulation, but the principle is exactly the same … it’s the same tool.


#3

There’s also a whole section in Automate the Boring Stuff on PDF manipulation.


#4

It’s a lot harder than it looks! Here’s my stab at it. It takes in the full path of the original pdf, the shared prefix of the output, and the splits as a list.


#5

Just for comparison purposes (and for the record, I’m all for a bit of Python to solve a problem :nerd_face:), here’s an example which splits out chunks of PDFs to separate files.

pdftk foo-bar.pdf cat 1-12 output foo.pdf
pdftk foo-bar.pdf cat 13-end output bar.pdf

Excerpt from https://stackoverflow.com/questions/17776582/split-a-pdf-in-two

This one combined two PDFs into one.

pdftk in1.pdf in2.pdf cat output out1.pdf

Example taken from https://www.pdflabs.com/docs/pdftk-cli-examples/

This removes pages 11 through 20 and output to a new PDF.

pdftk in.pdf cat 1-10 21-end output out1.pdf

Example based on one taken from https://www.pdflabs.com/docs/pdftk-cli-examples/


#6

Thanks to all for the suggestions and code samples on this. Way above my knowledge level since I’ve never used a script or code of any kind. Maybe it’s time to make this a weekend project and learn some new stuff! :grinning:


#7

Here’s something worth considering:

Automator has a built-in set of PDF manipulation actions, including “Split PDF”

Which you could use in a workflow to render a PDF file into separate files.

Maybe that’ll be useful?

Cheers – SAL


#8

How do you know where to split the file? Are the contracts bookmarked by section? If so, the job is trivial. Is there a constant word, font size, or something that the computer could look for to know where to split the document?


#9

I have a similar situation. I teach a class online through Google Hangouts to a school in another country. My aide daily scans all the students’ worksheets and the scanner/copier emails it to me in a pdf.

I use PDFpen Pro and this simple Keyboard Maestro workflow below. I too select the pages to be grouped, and then apply my keyboard shortcut (set in KM as control-command-A). The workflow creates a new pdf document with the selected pages and then brings up the “Save” dialog. I type in each student’s initials and paste in the name of the worksheet. It is not completely automated but it saves several steps.