Can git treat zip files as directories and files inside the zip as blobs?

Jonas Heidelberg picture Jonas Heidelberg · Nov 3, 2011 · Viewed 18.4k times · Source

The scenario

Imagine I am forced to work with some of my files always stored inside .zip files. Some of the files inside the zip are small text files and change often, while others are larger but luckily rather static (e.g. images).

If I want to place these zip files inside a git repository, each zip is treated as a blob, so whenever I commit the repository grows by the size of the zip file... even if only one small text file inside changed!

Why this is realistic

MS Word 2007/2010 .docx and Excel .xlsx files are ZIP files...

What I want

Is there, by any chance, a way to tell git to not treat zips as files, but rather as directories and treat their contents as files?

The advantages

But it couldn't work, you say?

I realize that without extra metadata this would lead to some amount of ambiguity: on a git checkout git would have to decide whether to create foo.zip/bar.txt as a file in a regular directory or a zip file. However this could be solved through config options, I would think.

Two ideas how it could be done (if it doesn't exist yet)

  • using a library such as minizip or IO::Compress::Zip inside git
  • somehow adding a filesystem layer such that git actually sees zip files as directories to start with

Answer

Jeff Ferland picture Jeff Ferland · Nov 3, 2011

This doesn't exist, but it could easily exist in the current framework. Just as git acts differently with displaying binary or ascii files when performing a diff, it could be told to offer special treatment to certain file types through the configuration interface.

If you don't want to change the code base (although this is kind of a cool idea you've got), you could also script it for yourself by using pre-commit and post-checkout hooks to unzip and store the files, then return them to their .zip state on checkout. You would have to restrict actions to only those files blobs / indexes that are specified by git add.

Either way is a bit of work -- it's just a question of whether the other git commands are aware of what's going on and play nicely.