Ruby: Download zip file and extract

23tux picture 23tux · Oct 16, 2015 · Viewed 7.9k times · Source

I have a ruby script that downloads a remote ZIP file from a server using rubys opencommand. When I look into the downloaded content, it shows something like this:

PK\x03\x04\x14\x00\b\x00\b\x00\x9B\x84PG\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\n\x00\x10\x00foobar.txtUX\f\x00\x86\v!V\x85\v!V\xF6\x01\x14\x00K\xCB\xCFOJ,RH\x03S\\\x00PK\a\b\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00PK\x01\x02\x15\x03\x14\x00\b\x00\b\x00\x9B\x84PG\xC1\xC0\x1F\xE8\f\x00\x00\x00\x0E\x00\x00\x00\n\x00\f\x00\x00\x00\x00\x00\x00\x00\x00@\xA4\x81\x00\x00\x00\x00foobar.txtUX\b\x00\x86\v!V\x85\v!VPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00D\x00\x00\x00T\x00\x00\x00\x00\x00

I tried using the Rubyzip gem (https://github.com/rubyzip/rubyzip) along with its class Zip::ZipInputStream like this:

stream = open("http://localhost:3000/foobar.zip").read # this outputs the zip content from above
zip = Zip::ZipInputStream.new stream

Unfortunately, this throws an error:

 Failure/Error: zip = Zip::ZipInputStream.new stream
 ArgumentError:
   string contains null byte

My questions are:

  1. Is it possible, in general, to download a ZIP file and extract its content in-memory?
  2. Is Rubyzip the right library for it?
  3. If so, how can I extract the content?

Answer

23tux picture 23tux · Oct 16, 2015

I found the solution myself and then at stackoverflow :D (How to iterate through an in-memory zip file in Ruby)

input = HTTParty.get("http://example.com/somedata.zip").body
Zip::InputStream.open(StringIO.new(input)) do |io|
  while entry = io.get_next_entry
    puts entry.name
    parse_zip_content io.read
  end
end
  1. Download your ZIP file, I'm using HTTParty for this (but you could also use ruby's open command (require 'open-uri').
  2. Convert it into a StringIO stream using StringIO.new(input)
  3. Iterate over every entry inside the ZIP archive using io.get_next_entry (it returns an instance of Entry)
  4. With io.read you get the content, and with entry.name you get the filename.