Merge Existing PDF into new ReportLab PDF via flowables

kyleturner picture kyleturner · Nov 13, 2012 · Viewed 11.9k times · Source

I have a reportlab SimpleDocTemplate and returning it as a dynamic PDF. I am generating it's content based on some Django model metadata. Here's my template setup:

buff = StringIO()
doc = SimpleDocTemplate(buff, pagesize=letter,
                        rightMargin=72,leftMargin=72,
                        topMargin=72,bottomMargin=18)
Story = []

I can easily add textual metadata from the Entry model into the Story list to be built later:

    ptext = '<font size=20>%s</font>' % entry.title.title()
    paragraph = Paragraph(ptext, custom_styles["Custom"])
    Story.append(paragraph)

And then generate the PDF to be returned in the response by calling build on the SimpleDocTemplate:

doc.build(Story, onFirstPage=entry_page_template, onLaterPages=entry_page_template)

pdf = buff.getvalue()
resp = HttpResponse(mimetype='application/x-download')    
resp['Content-Disposition'] = 'attachment;filename=logbook.pdf'
resp.write(pdf)
return resp

One metadata field on the model is a file attachment. When those file attachments are PDFs, I'd like to merge them into the Story that I am generating; IE meaning a PDF of reportlab "flowable" type.

I'm attempting to do so using pdfrw, but haven't had any luck. Ideally I'd love to just call:

from pdfrw import PdfReader
pdf = pPdfReader(entry.document.file.path)
Story.append(pdf)

and append the pdf to the existing Story list to be included in the generation of the final document, as noted above.

Anyone have any ideas? I tried something similar using pagexobj to create the pdf, trying to follow this example:

http://code.google.com/p/pdfrw/source/browse/trunk/examples/rl1/subset.py

from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl

pdf = pagexobj(PdfReader(entry.document.file.path))

But didn't have any luck either. Can someone explain to me the best way to merge an existing PDF file into a reportlab flowable? I'm no good with this stuff and have been banging my head on pdf-generation for days now. :) Any direction greatly appreciated!

Answer

RyanBrady picture RyanBrady · Feb 6, 2013

I just had a similar task in a project. I used reportlab (open source version) to generate pdf files and pyPDF to facilitate the merge. My requirements were slightly different in that I just needed one page from each attachment, but I'm sure this is probably close enough for you to get the general idea.

from pyPdf import PdfFileReader, PdfFileWriter

def create_merged_pdf(user):
    basepath = settings.MEDIA_ROOT + "/"
    # following block calls the function that uses reportlab to generate a pdf
    coversheet_path = basepath + "%s_%s_cover_%s.pdf" %(user.first_name, user.last_name, datetime.now().strftime("%f"))
    create_cover_sheet(coversheet_path, user, user.performancereview_set.all())

    # now user the cover sheet and all of the performance reviews to create a merged pdf
    merged_path = basepath + "%s_%s_merged_%s.pdf" %(user.first_name, user.last_name, datetime.now().strftime("%f"))

    # for merged file result
    output = PdfFileWriter()

    # for each pdf file to add, open in a PdfFileReader object and add page to output
    cover_pdf = PdfFileReader(file( coversheet_path, "rb"))
    output.addPage(cover_pdf.getPage(0))

    # iterate through attached files and merge.  I only needed the first page, YMMV
    for review in user.performancereview_set.all():
        review_pdf = PdfFileReader(file(review.pdf_file.file.name, "rb"))
        output.addPage(review_pdf.getPage(0)) # only first page of attachment

    # write out the merged file
    outputStream = file(merged_path, "wb")
    output.write(outputStream)
    outputStream.close()