How do I exclude files from git archive?

user743382 picture user743382 · Aug 3, 2016 · Viewed 9k times · Source

Given a simple test repository with a single commit with two files, a and b, I can get a list of specific files:

$ git ls-files a
a

Or a list of all files excluding specific files:

$ git ls-files . ':!b'
a

I can create an archive of specific files:

$ git archive HEAD a | tar tf -
a

But I cannot create an archive of all files excluding specific files:

$ git archive HEAD . ':!b' | tar tf -
a
b

The option of using an archive of specific files is not an option for me in my real repository, as it exceeds the maximum command line argument length.

I know I can store the list of files to exclude in .gitattributes via the export-ignore attribute, but the list is dynamically generated. I can automatically change the file, but changes do not get picked up until after another commit.

Is there some other invocation that works without requiring another commit?

Answer

kostix picture kostix · Aug 3, 2016

I think you almost nailed it: attributes can be read from several places, with .gitattributes being only the most common of them. The second one—considered a per-repository configuration—is $GIT_DIR/info/attributes.

To cite the manual:

Note that attributes are by default taken from the .gitattributes files in the tree that is being archived. If you want to tweak the way the output is generated after the fact (e.g. you committed without adding an appropriate export-ignore in its .gitattributes), adjust the checked out .gitattributes file as necessary and use --worktree-attributes option. Alternatively you can keep necessary attributes that should apply while archiving any tree in your $GIT_DIR/info/attributes file.

So, if possible, stick your list to that file and then do git archive.

Another approach is to not use git archive but instead merely tar the work tree passing tar the --exclude-from command-line option which accepts a file. This wouldn't work for a bare repository, but if you're OK with checking out stuff before archiving it, this can be done by doing git read-tree and git checkout-index supplied with the correct $GIT_INDEX_FILE and $GIT_WORK_TREE env. variables.

Another possible workaround is reversing the approach: tar (at least GNU tar) supports a lesser-known option of being able to delete stuff from an archive in a pipeline.

Basically, you can do

 $ tar -C a_path -c -f - . \
   | tar -f - --wildcards --delete '*.pdf' >result.tar

so that the first tar in the pipeline archives everything while the second one passes everything through except for files matching the *.pdf shell glob patten.

So if specifying files to delete using shell globs can be fitted to the command-line limit, just pipe the output of git archive to a tar prcocess which removes the stuff not needed.