zipfile cant handle some type of zip data?

hyperboreean picture hyperboreean · Feb 7, 2011 · Viewed 10.9k times · Source

I came up over this problem while trying to decompress a zip file.

-- zipfile.is_zipfile(my_file) always returns False, even though the UNIX command unzip handles it just fine. Also, when trying to do zipfile.ZipFile(path/file_handle_to_path) I get the same error

-- the file command returns Zip archive data, at least v2.0 to extract and using less on the file it shows:

PKZIP for iSeries by PKWARE Length Method Size Cmpr Date Time CRC-32 Name 2113482674 Defl:S 204502989 90% 2010-11-01 08:39 2cee662e myfile.txt 2113482674 204502989 90% 1 file

Any ideas how can I go around this issue ? It would be nice if I could make python's zipfile work since I already have some unit tests that I'll have to drop if I'll switch to running subprocess.call("unzip")

Answer

Uri Cohen picture Uri Cohen · Sep 17, 2011

Run into the same issue on my files and was able to solve it. I'm not sure how they were generated, like in the above example. They all had trailing data in the end ignored by both Windows by 7z and failing python's zipfile.

This is the code to solve the issue:

def fixBadZipfile(zipFile):  
     f = open(zipFile, 'r+b')  
     data = f.read()  
     pos = data.find('\x50\x4b\x05\x06') # End of central directory signature  
     if (pos > 0):  
         self._log("Truncating file at location " + str(pos + 22) + ".")  
         f.seek(pos + 22)   # size of 'ZIP end of central directory record' 
         f.truncate()  
         f.close()  
     else:  
         # raise error, file is truncated