How to remove previously added git subtree and its history

jlconlin picture jlconlin · Sep 29, 2014 · Viewed 28k times · Source

Many moons ago I added a subtree to my git repository. This subtree included several folders and files. I added the subtree instead of creating a submodule (as recommended). Now I realize I only want one of the files in the subtree and none of the rest. Even worse, when others clone my repository, what they get is not what is expected—there is some conflict with the subtree and the other code that I've created.

I can get ride of the files/folders with

git rm subtree–folder1 subtree_folder2 subtree_files.*

however, I'm still left with a lengthy commit history from the subtree.

I've done a fair amount of development since I originally added the subtree and can't lose the commit history that I've generated.

In short this is what I would like:

  1. Remove all the subtree files/folders.
  2. Forget the history of all the subtree commits.
  3. Left with only my code and my history.

Is this possible?

PS. One possible complication is that I moved the single header file I wanted to keep from the subtree to some folder in my code. I hope this is not what is keeping me from forgetting the subtree history.

An Attempt

After a fresh checkout from the remote server I have the following:

$ ls
.git             CMakeLists.txt   Read.cpp         logging.conf
.gitignore       ENDF6            TestData         src
.sparse-checkout LICENCE          doc              test
.travis.yml      README.md        include          tools

Where .gitignore only has: build/ debug/

When I try the command as suggested I don't get a very happy response:

$ git filter-branch --index-filter 'git rm --cached -rf test tools src doc LICENCE README.md .travis.yml' HEAD
Rewrite 2fec85e41e40ae18efd1b130f55b14166a422c7f (1/1701)fatal: pathspec 'test' did not match any files
index filter failed: git rm --cached -rf test tools src doc LICENCE README.md .travis.yml

I'm not sure why it says it has a problem with test when it is clearly there. I'm baffled.

Answer

Andrew C picture Andrew C · Oct 7, 2014

You need to use a filter-branch along with the --prune-empty option to remove any commits that no longer introduce new changes.

git filter-branch --index-filter 'git rm --cached --ignore-unmatch -rf dir1 dir2 dirN file1 file2 fileN' --prune-empty -f HEAD

After that, if you want to recover disk space you will need to delete all the original refs that filter branch saved, expire the reflog, and garbage collect.