A simple program for reading a CSV file inside a zip file works in Python 2.7, but not in Python 3.2
$ cat test_zip_file_py3k.py
import csv, sys, zipfile
zip_file = zipfile.ZipFile(sys.argv[1])
items_file = zip_file.open('items.csv', 'rU')
for row in csv.DictReader(items_file):
pass
$ python2.7 test_zip_file_py3k.py ~/data.zip
$ python3.2 test_zip_file_py3k.py ~/data.zip
Traceback (most recent call last):
File "test_zip_file_py3k.py", line 8, in <module>
for row in csv.DictReader(items_file):
File "/home/msabramo/run/lib/python3.2/csv.py", line 109, in __next__
self.fieldnames
File "/home/msabramo/run/lib/python3.2/csv.py", line 96, in fieldnames
self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file
in text mode?)
So the csv
module in Python 3 wants to see a text file, but zipfile.ZipFile.open
returns a zipfile.ZipExtFile
that is always treated as binary data.
How does one make this work in Python 3?
I just noticed that Lennart's answer didn't work with Python 3.1, but it does work with Python 3.2. They've enhanced zipfile.ZipExtFile
in Python 3.2 (see release notes). These changes appear to make zipfile.ZipExtFile
work nicely with io.TextWrapper
.
Incidentally, it works in Python 3.1, if you uncomment the hacky lines below to monkey-patch zipfile.ZipExtFile
, not that I would recommend this sort of hackery. I include it only to illustrate the essence of what was done in Python 3.2 to make things work nicely.
$ cat test_zip_file_py3k.py
import csv, io, sys, zipfile
zip_file = zipfile.ZipFile(sys.argv[1])
items_file = zip_file.open('items.csv', 'rU')
# items_file.readable = lambda: True
# items_file.writable = lambda: False
# items_file.seekable = lambda: False
# items_file.read1 = items_file.read
items_file = io.TextIOWrapper(items_file)
for idx, row in enumerate(csv.DictReader(items_file)):
print('Processing row {0} -- row = {1}'.format(idx, row))
If I had to support py3k < 3.2, then I would go with the solution in my other answer.