ZipFile and ZipArchive classes from System.IO.Compression and async I/O

Alexey Mitev picture Alexey Mitev · Sep 10, 2016 · Viewed 8.8k times · Source

.NET 4.5 has added new classes to work with zip archives. Now you can do something like this:

using (ZipArchive archive = ZipFile.OpenRead(zipFilePath))
{
    foreach (ZipArchiveEntry entry in archive.Entries)
    {
        // Extract it to the file
        entry.ExtractToFile(entry.Name);

        // or do whatever you want
        using (Stream stream = entry.Open())
        {
            ...
        }
    }
}

Obviously, if you work with large archives it may take seconds or even minutes to read the files from the archive. So if you were writing some GUI app (WinForms or WPF) you would probably run such code in a separate thread otherwise you will block UI thread and make your app users very upset.

However all I/O operations in this code will be executed in the blocking mode which is considered as "not cool" in 2016. So there are two questions:

  1. Is it possible to get async I/O with System.IO.Compression classes (or maybe with some other third-party .NET library)?
  2. Does it even make sense to do that? I mean compressing/extracting algorithms are very CPU-consuming anyway, so if we even switch from CPU-bound blocking I/O to async I/O, the performance gain can be relatively small (of course in percentage, not absolute values).

UPDATE:

To reply to the answer from Peter Duniho: yes, you're right. For some reason I didn't think about this option:

using (Stream zipStream = entry.Open())
using (FileStream fileStream = new FileStream(...))
{
    await zipStream.CopyToAsync(fileStream);
}

which definitely works. Thanks!

By the way

await Task.Run(() => entry.ExtractToFile(entry.Name));

will still be CPU-bound blocking I/O operation, just in separate thread consume the thread from the thread pool during I/O operations.

However as I can see developers of .NET still use blocking I/O for some archive operations (like this code to enumerate entries in the archive for example: ZipArchive.cs on dotnet@github). I also found an open issue about the lack of asynchronous API for ZipFile APIs.

I guess at this time we have partial async support but it is far from complete.

Answer

Peter Duniho picture Peter Duniho · Sep 10, 2016
  1. Is it possible to get async I/O with System.IO.Compression classes (or maybe with some other third-party .NET library)?

Depending on what you actually mean by "async I/O", you can do it with the built-in .NET types. For example:

using (ZipArchive archive = await Task.Run(() => ZipFile.OpenRead(zipFilePath)))
{
    foreach (ZipArchiveEntry entry in archive.Entries)
    {
        // Extract it to the file
        await Task.Run(() => entry.ExtractToFile(entry.Name));

        // or do whatever you want
        using (Stream stream = entry.Open())
        {
            // use XXXAsync() methods on Stream object
            ...
        }
    }
}

Wrap these in XXXAsync() extension methods if you like.

  1. Does it even make sense to do that? I mean compressing/extracting algorithms are very CPU-consuming anyway, so if we even switch from CPU-bound I/O to async I/O, the performance gain can be relatively small (of course in percentage, not absolute values).

At least three reasons to do it:

  1. CPUs are very fast. In many cases, I/O is still the bottleneck so asynchronously waiting on I/O is useful.
  2. Multi-core CPUs are the norm. So having one core working on decompression while another does other work is useful.
  3. Asynchronous operations are not entirely, and in some cases not at all, about performance. Asynchronously processing your archives allows a user interface to remain responsive, which is useful.