Generate flattened PDF with Python

MakeCents picture MakeCents · Nov 19, 2014 · Viewed 8.3k times · Source

When I print a PDF from any of my source PDFs, the file size drops and removes the text boxes presents in form. In short, it flattens the file. This is behavior I want to achieve.

The following code to create a PDF using another PDF as a source (the one I want to flatten), it writes the text boxes form as well.

Can I get a PDF without the text boxes, flatten it? Just like Adobe does when I print a PDF as a PDF.

My other code looks something like this minus some things:

import os
import StringIO
from pyPdf import PdfFileWriter, PdfFileReader
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

directory = os.path.join(os.getcwd(), "source")  # dir we are interested in
fif = [f for f in os.listdir(directory) if f[-3:] == 'pdf'] # get the PDFs
for i in fif:
    packet = StringIO.StringIO()
    can = canvas.Canvas(packet, pagesize=letter)
    can.rotate(-90)
    can.save()

    packet.seek(0)
    new_pdf = PdfFileReader(packet)
    fname = os.path.join('source', i)
    existing_pdf = PdfFileReader(file(fname, "rb"))
    output = PdfFileWriter()
    nump = existing_pdf.getNumPages()
    page = existing_pdf.getPage(0)
    for l in range(nump):
        output.addPage(existing_pdf.getPage(l))
    page.mergePage(new_pdf.getPage(0))
    outputStream = file("out-"+i, "wb")
    output.write(outputStream)
    outputStream.close()
    print fName + " written as", i

Summing up: I have a pdf, I add a text box to it, covering up info and adding new info, and then I print a pdf from that pdf. The text box becomes not editable or moveable any longer. I wanted to automate that process but everything I tried still allowed that text box to be editable.

Answer

naktinis picture naktinis · Nov 23, 2015

If installing an OS package is an option, then you could use pdftk with its python wrapper pypdftk like this:

import pypdftk
pypdftk.fill_form('filled.pdf', out_file='flattened.pdf', flatten=True)

You would also need to install the pdftk package, which on Ubuntu could be done like this:

sudo apt-get install pdftk

The pypdftk library can by downloaded from PyPI:

pip install pypdftk