Opening pdf urls with pyPdf

meadhikari picture meadhikari · Mar 17, 2012 · Viewed 20.3k times · Source

How would I open a pdf from url instead of from the disk

Something like

input1 = PdfFileReader(file("http://example.com/a.pdf", "rb"))

I want to open several files from web and download a merge of all the files.

Answer

John picture John · Mar 17, 2012

I think urllib2 will get you what you want.

from urllib2 import Request, urlopen
from pyPdf import PdfFileWriter, PdfFileReader
from StringIO import StringIO

url = "http://www.silicontao.com/ProgrammingGuide/other/beejnet.pdf"
writer = PdfFileWriter()

remoteFile = urlopen(Request(url)).read()
memoryFile = StringIO(remoteFile)
pdfFile = PdfFileReader(memoryFile)

for pageNum in xrange(pdfFile.getNumPages()):
        currentPage = pdfFile.getPage(pageNum)
        #currentPage.mergePage(watermark.getPage(0))
        writer.addPage(currentPage)


outputStream = open("output.pdf","wb")
writer.write(outputStream)
outputStream.close()