This probably never happened in the real-world yet, and may never happen, but let's consider this: say you have a git repository, make a commit, and get very very unlucky: one of the blobs ends up having the same SHA-1 as another that is already in your repository. Question is, how would Git handle this? Simply fail? Find a way to link the two blobs and check which one is needed according to the context?
More a brain-teaser than an actual problem, but I found the issue interesting.
I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version). I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git:
--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
blk_SHA1_Update(ctx, padlen, 8);
/* Output hash */
- for (i = 0; i < 5; i++)
- put_be32(hashout + i * 4, ctx->H[i]);
+ for (i = 0; i < 1; i++)
+ put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
+ for (i = 1; i < 5; i++)
+ put_be32(hashout + i * 4, 0);
}
Then I did a few commits and noticed the following.
For #2 you will typically get an error like this when you run "git push":
error: object 0400000000000000000000000000000000000000 is a tree, not a blob
fatal: bad blob object
error: failed to push some refs to origin
or:
error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000)
if you delete the file and then run "git checkout file.txt".
For #4 and #6, you will typically get an error like this:
error: Trying to write non-commit object
f000000000000000000000000000000000000000 to branch refs/heads/master
fatal: cannot update HEAD ref
when running "git commit". In this case you can typically just type "git commit" again since this will create a new hash (because of the changed timestamp)
For #5 and #9, you will typically get an error like this:
fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object
when running "git commit"
If someone tries to clone your corrupt repository, they will typically see something like:
git clone (one repo with collided blob,
d000000000000000000000000000000000000000 is commit,
f000000000000000000000000000000000000000 is tree)
Cloning into 'clonedversion'...
done.
error: unable to read sha1 file of s (d000000000000000000000000000000000000000)
error: unable to read sha1 file of tullebukk
(f000000000000000000000000000000000000000)
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'
What "worries" me is that in two cases (2,3) the repository becomes corrupt without any warnings, and in 3 cases (1,7,8), everything seems ok, but the repository content is different than what you expect it to be. People cloning or pulling will have a different content than what you have. The cases 4,5,6 and 9 are ok, since it will stop with an error. I suppose it would be better if it failed with an error at least in all cases.