Git filter-branch with index-filter does not work and remove directories as expected

riper picture riper · Dec 12, 2012 · Viewed 8.6k times · Source

Structure of Git repo foo in master branch

foo/refs/a.txt  
foo/bar/refs/b.txt  

In other branches refs/ might be in lots of other places

Goal

To remove all instances of the directory refs (and their content) from Git (history) Environment: Windows 7 using Git Bash

Removing refs (Git not involved, tried this just to see that it works by itself)

find . -name refs -depth -exec rm -rf {} \;

Success, all refs/ and their content are removed (If I don't use -depth, find will report an error that the dirs don't exists even though they were removed correctly).

Removing refs from Git

git filter-branch --index-filter \
'find . -name refs -depth -exec git rm -rf --cached --ignore-unmatch {} \;' \
--prune-empty --tag-name-filter cat -- --all

Removing directory refs from Git by rewriting the Git history

As can be seen in the picture (think of temp/a as temp/foo) the command runs through and rewrites all commits but no refs/ are removed so somehow the output of the find is not returned to filter-branch --index-filter as expected.

Similar things seem to work for others.
What am I missing?

PS. Yes I've read hundreds of posts, articles etc for hours and hours about this but it doesn't work for me anyway.

Answer

user456814 picture user456814 · Jul 8, 2013

Update

Although my old answer apparently helped the original poster partially solve his problem, it appears that I may not actually be correct that the --index-filter only works with Git commands, because in the documentation for git filter-branch, it gives an example of the filter being used with non-Git shell commands, in addition to Git commands:

git filter-branch --index-filter \
        'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
                GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
                        git update-index --index-info &&
         mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD

It may just be the case that if you're going to use non-Git commands with --index-filter, then they have to operate on the repository's index, as shown in the above example from the documentation.

So basically, I'm not sure why the original poster's original index filter didn't work, but it might be the case that he was trying to access a part of the repository that the index filter doesn't allow access to, or that whatever non-Git command he was using didn't actually modify the index.

Also, as I point out in the comments,

Git actually stores all of its references under .git/refs/ in non-bare repositories, in the working copy root...so the command find . -name refs -depth will actually dig up those directories too.

So maybe that's causing something to go horribly wrong during the filter-branch?

Old Answer

I think the problem might be that you're trying to use non-Git shell tools with the filter-branch --index-filter option instead of the --tree-filter option:

git filter-branch --index-filter \
'find . -name refs -depth -exec git rm -rf --cached --ignore-unmatch {} \;' \
--prune-empty --tag-name-filter cat -- --all

Unlike --tree-filter, which checks out a new working directory for each commit and runs the passed shell script on it, --index-filter only operates on the index file of a Git repo itself (it doesn't check out a working copy to operate on)...so only Git commands will work with it.

That's probably why you had better luck with this, because it passed Git commands to filter-branch --index-filter:

git filter-branch --index-filter \
'git rm -f --cached --ignore-unmatch *.zip && \
 git rm -rf --cached --ignore-unmatch refs' \
--prune-empty --tag-name-filter cat -- --all

This is the documentation for git-filter-branch(1) --tree-filter:

This is the filter for rewriting the tree and its contents. The argument is evaluated in shell with the working directory set to the root of the checked out tree.

and this is the documentation for --index-filter (emphasis mine):

This is the filter for rewriting the index. It is similar to the tree filter but does not check out the tree, which makes it much faster. Frequently used with git rm --cached --ignore-unmatch ..., see EXAMPLES below. For hairy cases, see git-update-index(1).