PyPDF2 split pdf by pages

Acamori picture Acamori · Jul 17, 2017 · Viewed 13.6k times · Source

I wanna split pdf file using PyPDF2.

All examples in net is too difficult or don't work or always give error "AttributeError: 'PdfFileWriter' object has no attribute 'stream'"

Can someone help with it ? Need separete one pdf with 3 pages into three different files.

I'm starting from that:

pdfFileObj = open(r"D:\BPO\act.pdf", 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pdfWriter = PyPDF2.PdfFileWriter()
pdfWriter.addPage(pdfReader.getPage(0))

But don't know what to do next :(

EDIT#1

Was try do a loop for spliting and i'm have a problem: PdfFileWriter make 3 files one with one page, second - with two, and third with three. Where is my mistake in following code:

act_sub_pages_name = ['p01.pdf', 'p02.pdf', 'p03.pdf']
with open(r"D:\BPO\act.pdf", 'rb') as act_mls:
    reader = PdfFileReader(act_mls)
    writer = PdfFileWriter()
    if reader.numPages == 3:
        counter = 0
        for x in range(3):
            path = '\\'.join(['D:\\BPO\\act sub pages', act_sub_pages_name[counter]])
            counter += 1
            writer.addPage(reader.getPage(x))
            with open(path, 'wb') as outfile: writer.write(outfile)

Sry for bad English.

EDIT#2

My solution according by Paul Rooney answer:

act_pdf_file = 'D:\\BPO\\act.pdf'
act_sub_pages_name = ['p01.pdf', 'p02.pdf', 'p03.pdf']

def pdf_splitter(index, src_file):
    with open(src_file, 'rb') as act_mls:
        reader = PdfFileReader(act_mls)
        writer = PdfFileWriter()
        writer.addPage(reader.getPage(index))
        out_file = os.path.join('D:\\BPO\\act sub pages', act_sub_pages_name[index])
        with open(out_file, 'wb') as out_pdf: writer.write(out_pdf)

for x in range(3): pdf_splitter(x, act_pdf_file)

With function all works properly but it a little bit harder.

Answer

Paul Rooney picture Paul Rooney · Jul 17, 2017

You can use the write method of the PdfFileWriter to write out to the file.

from PyPDF2 import PdfFileReader, PdfFileWriter

with open("input.pdf", 'rb') as infile:

    reader = PdfFileReader(infile)
    writer = PdfFileWriter()
    writer.addPage(reader.getPage(0))

    with open('output.pdf', 'wb') as outfile:
        writer.write(outfile)

You may want to loop over the pages of the input file, create a new writer object, add a single page. Then write out to an ever incrementing filename or have some other scheme for deciding output filename?