Splitting a set of files within a git repo into their own repository, preserving relevant history

jkeating picture jkeating · May 14, 2011 · Viewed 8.8k times · Source

Possible Duplicate:
How to split a git repository while preserving subdirectories?

At one point I added my code to an existing git repo, and have committed to it quite a lot since, while the other developer has committed to the other existing files in the repo. Now I want to split my code off into its own repo, but preserve all the change history for my particular files.

Reading through what others have done for splitting code out I was looking at filter-branch and doing --index-filter or --tree-filter with rm commands for the files I don't care about. I don't want to use --subdirectory-filter as it is not appropriate for the subdir holding my code to be the topdir (also we shared one subdir). To complicate matters, some of the files from the original repository have moved around a bit over time, and there are some files that were created then deleted. This makes designing an rm list a bit... challenging.

I'm looking for a way to filter everything /except/ a list of files/directories. Does anybody know of a way to do this?

Answer

jkeating picture jkeating · May 15, 2011

Just to close the loop on this so it appears as answered.

By using index-filter or tree-filter and then applying reverse logic like git ls-tree piped into (multiple) grep -v's piped into xargs for git rm you can indeed remove everything that doesn't match a narrow set of file names/directories. Here is the command I used to split out my particular files:

git filter-branch \
    --prune-empty \
    --index-filter '
        git ls-tree -z -r --name-only --full-tree $GIT_COMMIT \
        | grep -z -v "^src/pyfedpkg$" \
        | grep -z -v "^src/fedpkg" \
        | grep -z -v "^git-changelog" \
        | xargs -0 -r git rm --cached -r
    ' \
    -- \
    --all