command-line pdf toolkit for joining, splitting, modifying pdf documents

tags:

In the olden days, if you needed a few pages from a library reference book, you would head for the photocopier. These days, you can quickly snap a few photos with your phone. Or, if you have access to the reference book as a pdf, you can cleanly extract those pages using pdftk.

pdftk is a command-line utility that operates on pdf documents. Multiple pdf's can be fed in as input, and then any combination of their pages can be used to form the output. Command line options specify the desired page ranges (for instance, all even-numbered pages in 11-71, plus all pages in 315-319). Document operations can be automated by calling pdftk from a script via, e.g., the Python subprocess module.