REQUESTS: Return file object from url (as with open('','rb') )

TimY picture TimY · May 5, 2015 · Viewed 6.9k times · Source

I want to download a file straight into memory using requests in order to pass it directly to PyPDF2 reader avoiding writing it to disk, but I can't figure out how to pass it as a file object. Here's what I've tried:

import requests as req
from PyPDF2 import PdfFileReader

r_file = req.get('http://www.location.come/somefile.pdf')
rs_file = req.get('http://www.location.come/somefile.pdf', stream=True)

with open('/location/somefile.pdf', 'wb') as f:
    for chunk in r_file.iter_content():
        f.write(chunk)

local_file = open('/location/somefile.pdf', 'rb')

#Works:
pdf = PdfFileReader(local_file)

#As expected, these don't work:
pdf = PdfFileReader(rs_file)
pdf = PdfFileReader(r_file)
pdf = PdfFileReader(rs_file.content)
pdf = PdfFileReader(r_file.content)
pdf = PdfFileReader(rs_file.raw)
pdf = PdfFileReader(r_file.raw)

Answer

abarnert picture abarnert · May 5, 2015

Without having to know anything about requests, you can always make a file-like object out of anything you have in memory as a string using StringIO.

In particular:

  • Python 2 StringIO.StringIO(s) is a binary file.
  • Python 2 cStringIO.StringIO(s) is the same, but possibly more efficient.
  • Python 3 io.BytesIO(b) is a binary file (constructed from bytes).
  • Python 3 io.StringIO(s) is a Unicode text file.
  • Python 2 io.BytesIO(s) is a binary file.
  • Python 2 io.StringIO(u) is a Unicode text file (constructed from unicode).

(The first two are "binary" in the Python 2 sense--no line-ending conversion. The others are "binary" vs. "text" in the Python 3 sense--bytes vs. Unicode.)

So, io.BytesIO(response.content) gives you a valid binary file-like object in both Python 2 and Python 3. If you only care about Python 2, cStringIO.StringIO(response.content) may be more efficient.

Of course "file-like" only goes so far; if the library tries to, e.g., call the fileno method and start making C calls against the file descriptor it isn't going to work. But 99% of the time, this works.