Prune binary data from a git repository after the fact

Myrddin Emrys picture Myrddin Emrys · Dec 30, 2010 · Viewed 24k times · Source

I accidentally committed some large binary data into some commits. Since then I've updated my .gitignore, and those files are no longer being committed. But I'd like to go back into the older commits and selectively prune out this data from the repository, removing a couple directories that should have been in .gitignore. I don't want to remove the commits themselves.

How would I go about accomplishing this? My preferred method would be some way to retroactively apply the .gitignore rules to old commits... an answer that uses this method would also be pretty generally useful to others, since I'm sure my problem is not unique. It would also be quick to apply to a general solution, without lots of customization specific to each user's unique directory structure.

Is this possible, either the easy way I suggest above, or in some more complicated manner?

Answer

Ed. picture Ed. · Nov 18, 2013

The solution in this answer worked perfectly for me:

You can also test your clean process with a tool like bfg repo cleaner, as in this answer:

java -jar bfg.jar --delete-files *.{jpg,png,mp4,m4v,ogv,webm} ${bare-repo-dir};

(Except BFG makes sure it doesn't delete anything in your latest commit, so you need to remove those files in the current index and make a "clean" commit. All other previous commits will be cleaned by BFG)