How do I synchronise two remote Git repositories?

Danny Tuppeny picture Danny Tuppeny · Feb 24, 2013 · Viewed 47.6k times · Source

I have two repository urls, and I want to synchronise them such that they both contain the same thing. In Mercurial, what I'm trying to do would be:

hg pull {repo1}
hg pull {repo2}
hg push -f {repo1}
hg push -f {repo2}

This will result in two heads in both repos (I know it's not common to have two heads, but I'm doing this for synchornisation and it needs to be non-interactive. The heads will be merged manually from one of the repos and then the sync run again).

I'd like to do the same thing in Git. Eg., with no user interaction, get all of the changes into both repos, with multiple branches/heads/whatever to be merged later. I'm trying to do this using urls in the commands, rather than adding remotes(?), as there could be a number of repos involved, and having aliases for them all will just make my script more complicated.

I'm currently cloning the repo using git clone --bar {repo1} however I'm struggling to "update" it. I've tried get fetch {repo1} but that doesn't seem to pull my changes down; git log still doesn't show the changeset that has been added in repo1.

I also tried using --mirror in my push and clone, but that seemed to remote changesets from repo2 that didn't exist locally, whereas I need to keep changes from both repos :/

What's the best way to do this?

Edit: To make it a little clearer what I'm trying to do...

I have two repositories (eg. BitBucket and GitHub) and want people to be able to push to either (ultimately, one will be Git, one will be Mercurial, but let's assume they're both Git for now to simplify things). I need to be able to run a script that will "sync" the two repos in a way that they both contain both sets of changes, and may require merging manually later.

Eventually, this means I can just interact with one of the repos (eg. the Mercurial one), and my script will periodically pull in Git changes which I can merge in, and then they'll be pushed back.

In Mercurial this is trivial! I just pull from both repos, and push with -f/--force to allow pushing multiple heads. Then anybody can clone one of the repos, merge the heads, and push back. I want to know how to do the closest similar thing in Git. It must be 100% non-interactive, and must keep both repos in a state that the process can be repeated infinitely (that means no rewriting history/changing changesets etc).

Answer

Eevee picture Eevee · Feb 24, 2013

Git branches do not have "heads" in the Mercurial sense. There is only one thing called HEAD, and it's effectively a symlink to the commit you currently have checked out. In the case of hosted repositories like GitHub, there is no commit checked out—there's just the repository history itself. (Called a "bare" repo.)

The reason for this difference is that Git branch names are completely arbitrary; they don't have to match between copies of a repository, and you can create and destroy them on a whim.[1] Git branches are like Python variable names, which can be shuffled around and stuck to any value as you like; Mercurial branches are like C variables, which refer to fixed preallocated memory locations you then fill with data.

So when you pull in Mercurial, you have two histories for the same branch, because the branch name is a fixed meaningful thing in both repositories. The leaf of each history is a "head", and you'd normally merge them to create a single head.

But in Git, fetching a remote branch doesn't actually affect your branch at all. If you fetch the master branch from origin, it just goes into a branch called origin/master.[2] git pull origin master is just thin sugar for two steps: fetching the remote branch into origin/master, and then merging that other branch into your current branch. But they don't have to have the same name; your branch could be called development or trunk or whatever else. You can pull or merge any other branch into it, and you can push it to any other branch. Git doesn't care.

Which brings me back to your problem: you can't push a "second" branch head to a remote Git repository, because the concept doesn't exist. You could push to branches with mangled names (bitbucket_master?), but as far as I'm aware, you can't update a remote's remotes remotely.

I don't think your plan makes a lot of sense, though, since with unmerged branches exposed to both repositories, you'd either have to merge them both, or you'd merge one and then mirror it on top of the other... in which case you left the second repository in a useless state for no reason.

Is there a reason you can't just do this:

  1. Pick a repository to be canonical—I assume BitBucket. Clone it. It becomes origin.

  2. Add the other repository as a remote called, say, github.

  3. Have a simple script periodically fetch both remotes and attempt to merge the github branch(es) into the origin branches. If the merge fails, abort and send you an email or whatever. If the merge is trivial, push the result to both remotes.

Of course, if you just do all your work on feature branches, this all becomes much less of a problem. :)


[1] It gets even better: you can merge together branches from different repositories that have no history whatsoever in common. I've done this to consolidate projects that were started separatedly; they used different directory structures, so it works fine. GitHub uses a similar trick for its Pages feature: the history of your Pages is stored in a branch called gh-pages that lives in the same repository but has absolutely no history in common with the rest of your project.

[2] This is a white lie. The branch is still called master, but it belongs to the remote called origin, and the slash is syntax for referring to it. The distinction can matter because Git has no qualms about slashes in branch names, so you could have a local branch named origin/master, and that would shadow the remote branch.