Python writing binary files, bytes

Turtles Are Cute picture Turtles Are Cute · May 19, 2013 · Viewed 40.1k times · Source

Python 3. I'm using QT's file dialog widget to save PDFs downloaded from the internet. I've been reading the file using 'open', and attempting to write it using the file dialog widget. However, I've been running into a"TypeError: '_io.BufferedReader' does not support the buffer interface" error.

Example code:

with open('file_to_read.pdf', 'rb') as f1: 
    with open('file_to_save.pdf', 'wb') as f2:
        f2.write(f1)

This logic works properly with text files when not using the 'b' designator, or when reading a file from the web, like with urllib or requests. These are of the 'bytes' type, which I think I need to be opening the file as. Instead, it's opening as a Buffered Reader. I tried bytes(f1), but get "TypeError: 'bytes' object cannot be interpreted as an integer." Any ideaas?

Answer

dawg picture dawg · May 19, 2013

If your intent is to simply make a copy of the file, you could use shutil

>>> import shutil
>>> shutil.copyfile('file_to_read.pdf','file_to_save.pdf')

Or if you need to access byte by byte, similar to your structure, this works:

>>> with open('/tmp/fin.pdf','rb') as f1:
...    with open('/tmp/test.pdf','wb') as f2:
...       while True:
...          b=f1.read(1)
...          if b: 
...             # process b if this is your intent   
...             n=f2.write(b)
...          else: break

But byte by byte is potentially really slow.

Or, if you want a buffer that will speed this up (without taking the risk of reading an unknown file size completely into memory):

>>> with open('/tmp/fin.pdf','rb') as f1:
...    with open('/tmp/test.pdf','wb') as f2:
...       while True:
...          buf=f1.read(1024)
...          if buf: 
...              for byte in buf:
...                 pass    # process the bytes if this is what you want
...                         # make sure your changes are in buf
...              n=f2.write(buf)
...          else:
...              break

With Python 2.7+ or 3.1+ you can also use this shortcut (rather than using two with blocks):

with open('/tmp/fin.pdf','rb') as f1,open('/tmp/test.pdf','wb') as f2:
    ...