Why Isn't There A Git Clone Specific Commit Option?

ivan.sim picture ivan.sim · Oct 1, 2014 · Viewed 29.4k times · Source

In light of a recent question on SO, I am wondering why isn't there an option in git clone such that the HEAD pointer of the newly created branch will point to a specified commit? In say question above, OP is trying to provide instructions on the specific commit his users should clone.

Note that this question is not about How To Clone To A Particular Version using reset; but about why isn't there?

Answer

torek picture torek · Oct 1, 2014

Two answers so far (at the time I wrote this, now there are more) are correct in what they say, but don't really answer the "why" question. Of course, the "why" question is really hard to answer, except by the authors of the various bits of Git (and even then, what if two frequent Git contributors gave two different answers?).

Still, considering Git's "philosophy" as it were, in general, the various transfer protocols work by naming a reference. If they provide an SHA-1, it's the SHA-1 of that reference. For someone who does not already have direct (e.g., command-line) access to the repository, none1 of the built in commands allow one to refer to commits by ID. The closest thing I can find to a reason for this—and it is actually a good reason2—is this bit in the git upload-archive documentation:

SECURITY

In order to protect the privacy of objects that have been removed from history but may not yet have been pruned, git-upload-archive avoids serving archives for commits and trees that are not reachable from the repository's refs. However, because calculating object reachability is computationally expensive, git-upload-archive implements a stricter but easier-to-check set of rules ...

However, it goes on to say:

If the config option uploadArchive.allowUnreachable is true, these rules are ignored, and clients may use arbitrary sha1 expressions. This is useful if you do not care about the privacy of unreachable objects, or if your object database is already publicly available for access via non-smart-http.

which is particularly interesting since git clone gets all reachable objects in the first place, after which your local clone could trivially check out a commit by SHA-1 ID (and create a local branch name pointing to that ID if desired, or just leave your clone in "detached HEAD" mode).

Given these two cross-currents, I think the real answer to "why", at this point, is "nobody cares enough to add it". :-) The privacy argument is valid, but there is no reason that git clone could not check out a commit by ID after cloning, just as it can be told to check out some branch other than master3 with git clone -b .... The only drawback to allowing -b sha1 is that Git cannot check up front (before the cloning process begins) whether sha1 will be received. It can check reference names, since those are transferred (along with their branch tips or other SHA-1 values) up front, so git clone -b nonexistentbranch ssh://... terminates quickly and does not create the copy:

fatal: Remote branch nonexistentbranch not found in upstream origin
fatal: The remote end hung up unexpectedly

If -b allowed an ID, you'd get the whole clone, then it would have to tell you: "oh gosh, sorry, can't check out that ID, I'll leave you on master instead" or whatever. (Which is more or less what happens now with a busted submodule.)


1While git upload-archive now enforces this "privacy" rule, this was not always the case (it was introduced in version 1.7.8.1); and many (most?) git-web servers, including the one distributed with Git itself, allow viewing by arbitrary ID. This is probably why allowUnreachable was added to upload-archive a few years after the "only by ref name" code was added (but note that releases of Git after 1.7.8 and before 2.0.0 have no way to loosen the rules). Hence, while the "security" idea is valid, there was a period (pre 1.7.8.1) when it was not enforced.

2There are numerous ways to "leak" ostensibly private data out of a Git repository. A new file, Documentation/transfer-data-leaks, is about to appear in Git 2.11.1, while Git 2.11.0 added some internal features (see commit 722ff7f87 among others) to immediately drop objects pushed but not accepted. Such objects are eventually garbage-collected, but that leaves them exposed for the duration.

3Actually, by default git clone makes a local check-out of the branch it thinks goes with the remote's HEAD reference. Usually that's master anyway, though.