I have a repo of 10 GB on a Linux machine which is on NFS. The first time git status
takes 36 minutes and subsequent git status
takes 8 minutes. Seems Git depends on the OS for caching files. Only the first git
commands like commit
, status
that involves pack/repack the whole repo takes a very long time for a huge repo. I am not sure if you have used git status
on such a large repo, but has anyone come across this issue?
I have tried git gc
, git clean
, git repack
but the time taken is still/almost the same.
Will sub-modules or any other concepts like breaking the repo into smaller ones help? If so which is the best for splitting a larger repo. Is there any other way to improve time taken for git commands on a large repo?
To be more precise, git depends on the efficiency of the lstat(2)
system call, so tweaking your client’s “attribute cache timeout” might do the trick.
The manual for git-update-index
— essentially a manual mode for git-status
— describes what you can do to alleviate this, by using the --assume-unchanged
flag to suppress its normal behavior and manually update the paths that you have changed. You might even program your editor to unset this flag every time you save a file.
The alternative, as you suggest, is to reduce the size of your checkout (the size of the packfiles doesn’t really come into play here). The options are a sparse checkout, submodules, or Google’s repo tool.
(There’s a mailing list thread about using Git with NFS, but it doesn’t answer many questions.)