git splitting repository by subfolder and retain all old branches

Sridhar picture Sridhar · Dec 24, 2013 · Viewed 7.9k times · Source

I have a git repo with 2 directories and multiple branches, I want to split them and create all branches

`-- Big-repo
    |-- dir1
    `-- dir2

Branches : branch1, branch2, branch3 ...

What I want

I want to split dir1 and dir2 as two separate repos and retain branches branch1, branch2 ... in both repositories.

dir1
Branches : branch1, branch2, branch3 ...

dir2
Branches : branch1, branch2, branch3 ...

What I tried:

I am able to split them into 2 repos using

git subtree split -P dir1 -b dir1-only 
git subtree split -P dir2 -b dir2-only 

But, it is not creating any branches after separation.

To get all branches:

git checkout branch1 (in Big-repo)
git subtree split -p dir1 -b dir1-branch1

git checkout branch2 (in Big-repo)
git subtree split -p dir1 -b dir1-branch2

And push these branches to newly created repo.

This involves more manual effort and I am sure there might be a quick way to achieve this?

Any ideas???

Answer

Maic López Sáenz picture Maic López Sáenz · Dec 27, 2013

Short answer

git filter-branch offers exactly the functionality you want. With the --subdirectory-filter option you can create a new set of commits where the contents of subDirectory are at the root of the directory.

git filter-branch --prune-empty --subdirectory-filter subDirectory -- --branches

Walkthrough

The following is an example to perform this in a safe way. You need to perform this for each subdirectory that will be isolated into its own repo, in this case dir1.

First clone your repository to keep the changes isolated:

git clone yourRemote dir1Clone
cd dir1Clone

To prepare the cloned repository we will recreate all remote branches as local ones. We skip the one starting with * since that is the current branch, which in this case would read (no branch) since we are in a headless state:

# move to a headless state
# in order to delete all branches without issues
git checkout --detach

# delete all branches
git branch | grep --invert-match "*" | xargs git branch -D

To recreate all remote branches locally we go through the results of git branch --remotes. We skip the ones containing -> since those are not branches:

# get all local branches for remote
git branch --remotes --no-color | grep --invert-match "\->" | while read remote; do
    git checkout --track "$remote"
done

# remove remote and remote branches
git remote remove origin

Finally run the filter-branch command. This will create new commits with all the commits that touch the dir1 subdirectory. All branches that also touch this subdirectory will get updated. The output will list all the references that where not updated, which is the case for branches that do not touch dir1 at all.

# Isolate dir1 and recreate branches
# --prune-empty removes all commits that do not modify dir1
# -- --all updates all existing references, which is all existing branches
git filter-branch --prune-empty --subdirectory-filter dir1 -- --all

After this you will have a new set of commits that have dir1 at the root of the repository. Just add your remote to push the new commits, or use these as a new repository altogether.

As an additional last step if you care about the repository size:

Even if all branches where updated your repository will still have all the objects of the original repository, tho only reachable through the ref-logs. If you want to drop these read how to garbage collect commits

Some additional resources: