What is my bottleneck when cloning a git repository from a virtual machine with a fast network connection?

Thorbjørn Ravn Andersen picture Thorbjørn Ravn Andersen · Nov 18, 2011 · Viewed 40.5k times · Source

I have a situation with a relatively large git repository located in a virtual machine on an elderly, slow host on my local network where it takes quite a while to do the initial clone.

ravn@bamboo:~/git$ git clone gitosis@gitbox:git00
Initialized empty Git repository in /home/ravn/git/git00/.git/
remote: Counting objects: 89973, done.
remote: Compressing objects: 100% (26745/26745), done.
remote: Total 89973 (delta 50970), reused 85013 (delta 47798)
Receiving objects: 100% (89973/89973), 349.86 MiB | 2.25 MiB/s, done.
Resolving deltas: 100% (50970/50970), done.
Checking out files: 100% (11722/11722), done.
ravn@bamboo:~/git$

There is no git specific configuration changes in gitosis.

Is there any way of speeding up the receiving bit up to what the network is capable of?


EDIT: I need the new repositories to be properly connected with the upstream repository. To my understanding this require git to do the cloning, and thus raw bit copying outside of git will not work.

Answer

sehe picture sehe · Nov 18, 2011

PS. Fair warning:

git is generally considered blazingly fast. You should try cloning a full repo from darcs, bazaar, hg (god forbid: TFS or subversion...). Also, if you routinely clone full repos from scratch, you'd be doing something wrong anyway. You can always just git remote update and get incremental changes.

For various other ways to keep full repos in synch see, e.g.

(The contain links to other relevant SO posts)

Dumb copy

As mentioned you could just copy a repository with 'dumb' file transfer.

This will certainly not waste time compressing, repacking, deltifying and/or filtering.

Plus, you will get

  • hooks
  • config (remotes, push branches, settings (whitespace, merge, aliases, user details etc.)
  • stashes (see Can I fetch a stash from a remote repo into a local branch? also)
  • rerere cache
  • reflogs
  • backups (from filter-branch, e.g.) and various other things (intermediate state from rebase, bisect etc.)

This may or may not be what you require, but it is nice to be aware of the fact


Bundle

Git clone by default optimizes for bandwidth. Since git clone, by default, does not mirror all branches (see --mirror) it would not make sense to just dump the pack-files as-is (because that will send possibly way more than required).

When distributing to a truly big number of clients, consider using bundles.

If you want a fast clone without the server-side cost, the git way is bundle create. You can now distribute the bundle, without the server even being involved. If you mean that bundle... --all includes more than simple git clone, consider e.g. bundle ... master to reduce the volume.

git bundle create snapshot.bundle --all # (or mention specific ref names instead of --all)

and distribute the snapshot bundle instead. That's the best of both worlds, while of course you won't get the items from the bullet list above. On the receiving end, just

git clone snapshot.bundle myclonedir/

Compression configs

You can look at lowering server load by reducing/removing compression. Have a look at these config settings (I assume pack.compression may help you lower the server load)

core.compression

An integer -1..9, indicating a default compression level. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If set, this provides a default to other compression variables, such as core.loosecompression and pack.compression.

core.loosecompression

An integer -1..9, indicating the compression level for objects that are not in a pack file. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If not set, defaults to core.compression. If that is not set, defaults to 1 (best speed).

pack.compression

An integer -1..9, indicating the compression level for objects in a pack file. -1 is the zlib default. 0 means no compression, and 1..9 are various speed/size tradeoffs, 9 being slowest. If not set, defaults to core.compression. If that is not set, defaults to -1, the zlib default, which is "a default compromise between speed and compression (currently equivalent to level 6)."

Note that changing the compression level will not automatically recompress all existing objects. You can force recompression by passing the -F option to git-repack(1).

Given ample network bandwidth, this will in fact result in faster clones. Don't forget about git-repack -F when you decide to benchmark that!