remove all binary files recursively from git repo and commit history

punkish picture punkish · Jul 2, 2013 · Viewed 9.8k times · Source

I have read a few different threads on removing large binary files from git commit history, but my problem is just a little bit different. Hence my question here to understand and confirm the steps--

My git repo is ~/foo. I want to remove all *.jpg, *.png, *.mp4, *.ogv (and so on) from one of the directories inside the repo, specifically from ~/foo/public/data.

Step 1. Remove the files

~/foo/data > find -E . -regex ".*\.(jpg|png|mp4|m4v|ogv|webm)" \
    -exec git filter-branch --force --index-filter \
    'git rm --cached --ignore-unmatch {}' \
    --prune-empty --tag-name-filter cat -- --all \;

Step 2. Add the binary file extensions to .gitignore and commit .gitignore

~/foo/data > cd ..
~/foo > git add .gitignore
~/foo > git commit -m "added binary files to .gitignore"

Step 3. Push everything

~/foo > git push origin master --force

Am I on the right track above? I want to measure twice before I cut once, so to say.

Update: Well, the above gives me the error

You need to run this command from the toplevel of the working tree.
You need to run this command from the toplevel of the working tree.
..

So I went up the tree to the top level and re-ran the command, and it all worked.

Answer

VonC picture VonC · Jul 2, 2013

The process seems right.

You can also test your clean process with a tool like bfg repo cleaner, as in this answer:

java -jar bfg.jar --delete-files *.{jpg,png,mp4,m4v,ogv,webm} ${bare-repo-dir};

(Except BFG makes sure it doesn't delete anything in your latest commit, so you need to remove those files in the current index and make a "clean" commit. All other previous commits will be cleaned by BFG)

Update 2020: for removing files, you would now use git filter-repo (Git 2.22+, Q4 2019), since git filter-branch or BFG are now, 7 years later, obsolete.

git filter-repo --path fileToRemove --invert-paths