Java 7 is supposed to fix an old problem with unpacking zip archives with character sets other than UTF-8. This can be achieved by constructor ZipInputStream(InputStream, Charset)
. So far, so good. I can unpack a zip archive containing file names with umlauts in them when explicitly setting an ISO-8859-1 character set.
But here is the problem: When iterating over the stream using ZipInputStream.getNextEntry()
, the entries have wrong special characters in their names. In my case the umlaut "ü" is replaced by a "?" character, which is obviously wrong. Does anybody know how to fix this? Obviously ZipEntry
ignores the Charset
of its underlying ZipInputStream
. It looks like yet another zip-related JDK bug, but I might be doing something wrong as well.
...
zipStream = new ZipInputStream(
new BufferedInputStream(new FileInputStream(archiveFile), BUFFER_SIZE),
Charset.forName("ISO-8859-1")
);
while ((zipEntry = zipStream.getNextEntry()) != null) {
// wrong name here, something like "M?nchen" instead of "München"
System.out.println(zipEntry.getName());
...
}
I played around for two or so hours, but just five minutes after I finally posted the question here, I bumped into the answer: My zip file was not encoded with ISO-8859-1, but with Cp437. So the constructor call should be:
zipStream = new ZipInputStream(
new BufferedInputStream(new FileInputStream(archiveFile), BUFFER_SIZE),
Charset.forName("Cp437")
);
Now it works like a charm.