Read tar.gz in Java with Commons-compression

zpontikas picture zpontikas · Sep 9, 2014 · Viewed 13.3k times · Source

Ok so I want to read the contents of a tar.gz file (or a xy) but that's the same thing. What I am doing is more or less this:

TarArchiveInputStream tarInput = new TarArchiveInputStream(new GzipCompressorInputStream(new FileInputStream("c://temp//test.tar.gz")));
TarArchiveEntry currentEntry = tarInput.getNextTarEntry();
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
while (currentEntry != null) {
    File f = currentEntry.getFile();
    br = new BufferedReader(new FileReader(f));
    System.out.println("For File = " + currentEntry.getName());
    String line;
    while ((line = br.readLine()) != null) {
        System.out.println("line="+line);
    }
}
if (br!=null) {
    br.close();
}

But I get null when I call the getFile method of TarArchiveEntry.
I am using Apache commons compress 1.8.1

Answer

Simone Gianni picture Simone Gianni · Sep 9, 2014

You can't use the getFile of TarArchiveEntry. That getter is there only for the opposite operation, when you are compressing files inside a tar file.

Instead, you should read directly from TarArchiveInputStream. It will take care of returning you the content of the "file" decompressing it on the fly.

For example (untested code, YMMV) :

TarArchiveInputStream tarInput = new TarArchiveInputStream(new GzipCompressorInputStream(new FileInputStream("c://temp//test.tar.gz")));
TarArchiveEntry currentEntry = tarInput.getNextTarEntry();
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
while (currentEntry != null) {
    br = new BufferedReader(new InputStreamReader(tarInput)); // Read directly from tarInput
    System.out.println("For File = " + currentEntry.getName());
    String line;
    while ((line = br.readLine()) != null) {
        System.out.println("line="+line);
    }
    currentEntry = tarInput.getNextTarEntry(); // You forgot to iterate to the next file
}