I want to download a file straight into memory using requests
in order to pass it directly to PyPDF2
reader avoiding writing it to disk, but I can't figure out how to pass it as a file object
. Here's what I've tried:
import requests as req
from PyPDF2 import PdfFileReader
r_file = req.get('http://www.location.come/somefile.pdf')
rs_file = req.get('http://www.location.come/somefile.pdf', stream=True)
with open('/location/somefile.pdf', 'wb') as f:
for chunk in r_file.iter_content():
f.write(chunk)
local_file = open('/location/somefile.pdf', 'rb')
#Works:
pdf = PdfFileReader(local_file)
#As expected, these don't work:
pdf = PdfFileReader(rs_file)
pdf = PdfFileReader(r_file)
pdf = PdfFileReader(rs_file.content)
pdf = PdfFileReader(r_file.content)
pdf = PdfFileReader(rs_file.raw)
pdf = PdfFileReader(r_file.raw)
Without having to know anything about requests
, you can always make a file-like object out of anything you have in memory as a string using StringIO
.
In particular:
StringIO.StringIO(s)
is a binary file.cStringIO.StringIO(s)
is the same, but possibly more efficient.io.BytesIO(b)
is a binary file (constructed from bytes
).io.StringIO(s)
is a Unicode text file.io.BytesIO(s)
is a binary file.io.StringIO(u)
is a Unicode text file (constructed from unicode
).(The first two are "binary" in the Python 2 sense--no line-ending conversion. The others are "binary" vs. "text" in the Python 3 sense--bytes vs. Unicode.)
So, io.BytesIO(response.content)
gives you a valid binary file-like object in both Python 2 and Python 3. If you only care about Python 2, cStringIO.StringIO(response.content)
may be more efficient.
Of course "file-like" only goes so far; if the library tries to, e.g., call the fileno
method and start making C calls against the file descriptor it isn't going to work. But 99% of the time, this works.