I can't seem to find a good explanation of this.
I know what git pull does:
1) a fetch, i.e. all the extra commits from the server are copied into the local repo and the origin/master branch pointer moves to the end of the commit chain
2) a merge of the origin/master branch into the master branch, the master branch pointer moving to the newly created commit, while the origin/master pointer staying put.
I assume git push does something very similar, but I don't know for sure. I believe it does one of these, or something similar, or something else (?):
OR
I'm currently using git for basic operations so I'm doing fine, but I want to fully understand these internals.
Assuming you already understand git's "objects" model (your commits and files and so on are all just "objects in the git database", with "loose" objects—those not packed up to save space—stored in .git/objects/12/34567...
and the like)...
You are correct: git fetch
retrieves objects "they" (origin
, in this case) have that you don't, and sticks labels on them: origin/master
and the like. More specifically, your git calls up theirs on the Internet-phone (or any other suitable transport) and asks: what branches do you have, and what commit IDs are those? They have master
and the ID is 1234567...
, so your git asks for 1234567...
and any other objects needed that you don't already have, and makes your origin/master
point to commit object 1234567...
.
The part of git push
that is symmetric here is this: your git calls up their git on the same Internet-phone as usual, but this time, instead of just asking them about their branches, your git tells them about your branches and your git repository objects, and then says: "How about I get you to set your master
to 56789ab...
?"
Their git takes a look at the objects you sent over (the new commit 56789ab...
and whatever other objects you have that they didn't, that they would need to take it). Their git then considers the request to set their master
to 56789ab...
.
As Chris K already answered, there is no merging happening here: your git simply proposes that their git overwrite their master
with this new commit-ID. It's up to their git to decide whether to allow that.
If "they" (whoever they are) have not set up any special rules, the default rule that git uses here is very simple: the overwrite is allowed if the change is a "fast forward". It has one additional feature: the overwrite is also allowed if the change is done with the "force" flag set. It's usually not a good idea to set the force flag here, as the default rule, "only fast forwards", is usually the right rule.
The obvious question here is: what exactly is a fast forward? We'll get to that in a moment; first I need to expand a bit on labels, or "references" to be more formal.
In git, a branch, or a tag, or even things like the stash and HEAD
are all references. Most of them are found in .git/refs/
, a sub-directory of the git repository. (A few top-level references, including HEAD
, are right in .git
itself.) All a reference is, is a file1 containing an SHA-1 ID like 7452b4b5786778d5d87f5c90a94fab8936502e20
. SHA-1 IDs are cumbersome and impossible for people to remember, so we use names, like v2.1.0
(a tag in this case, version 2.1.0 of git itself) to save them for us.
Some references are—or at least are intended to be—totally static. The tag v2.1.0
should never refer to something other than the SHA-1 ID above. But some references are more dynamic. Specifically, your own local branches, like master
, are moving targets. One special case, HEAD
, is not even a target of its own: it generally contains the name of the moving-target branch. So there's one exception for "indirect" references: HEAD
usually contains the string ref: refs/heads/master
, or ref: refs/heads/branch
, or something along those lines; and git does not (and cannot) enforce a "never change" rule for references. Branches in particular change a lot.
How do you know if a reference is supposed to change? Well, a lot of this is just by convention: branches move and tags don't. But you should then ask: how do you know if a reference is a branch, or a tag, or what?
refs/heads/
, refs/tags/
, etc.Other than the special top-level references, all of git's references are in refs/
as we already noted above. Within the refs/
directory (or "folder" if you're on Windows or Mac), though, we can have a whole collection of sub-directories. Git has, at this point, four well-defined subdirectories: refs/heads/
contains all your branches, refs/tags/
contains all your tags, refs/remotes/
contains all your "remote-tracking branches", and refs/notes/
contains git's "notes" (which I will ignore here as they get a bit complicated).
Since all your branches are in refs/heads/
, git can tell that these should be allowed to change, and since all your tags are in refs/tags/
, git can tell that these should not.
When you make a new commit, and are on a branch like master
, git will automatically move the reference. Your new commit is created with its "parent commit" being the previous branch-tip, and once your new commit is safely saved away, git changes master
to contain the ID of the new commit. In other words, it makes sure that the branch name, the reference in the heads
sub-directory, always points to the tip-most commit.
(In fact, the branch, in the sense of a collection of commits that is part of the commit-graph stored in the repository, is a data structure made out of the commits in the repository. Its only connection with the branch name is that the tip commit of the branch itself is stored in the reference label with that name. This is important later, if and when branch names are changed or erased as the repository grows many more commits. For now it's just something to keep in mind: there's a difference between the "branch tip", which is where the "branch name" points, and the branch-as-a-subset-of-commit-DAG. It's a bit unfortunate that git tends to lump these different concepts under a single name, "branch".)
Usually you see "fast forward" in the context of merge, often with the merge done as the second step in a git pull
. But in fact, "fast forwarding" is actually a property of a label move.
Let's draw a little bit of a commit graph. The little o
nodes represent commits, and each one has an arrow pointing left, left-and-up, or left-and-down (or in one case, two arrows) to its parent (or parents). To be able to refer to three by name I'll give them uppercase letter names instead of o
. Also, this character-based artwork doesn't have arrows, so you have to imagine them; just remember that they all point left or left-ish, just like the three names.
o - A <-- name1
/
o - o - o - o - B <-- name2
\ /
o - C <-- name3
When you ask git to change a reference, you simply ask it to stick a new commit ID into the label. In this case, these labels live in refs/heads/
and are thus branch names, so they are supposed to be able to take on new values.
If we tell git to put B
into name1
, we get this:
o - A
/
o - o - o - o - B <-- name1, name2
\ /
o - C <-- name3
Note that commit A
now has no name, and the o
to the left of it is found only by finding A
... which is hard since A
has no name. Commit A
has been abandoned, and these two commits have become eligible for "garbage collection". (In git, there's a "ghost name" left behind in the "reflog", that keeps the branch with A
around for 30 days in general. But that's a different topic entirely.)
What about telling git to put B
into name3
? If we do that next, we get this:
o - A
/
o - o - o - o - B <-- name1, name2, name3
\ /
o - C
Here, commit C
still has a way to find it: start at B
and work down-and-left, to its other (second) parent commit, and you find commit C
. So commit C
is not abandoned.
Updating name1
like this is not a fast-forward, but updating name3
is.
More specifically, a reference-change is a "fast forward" if and only if the object—usually a commit—that the reference used to point-to is still reachable by starting from the new place and working backwards, along all possible backwards paths. In graph terms, it's a fast-forward if the old node is an ancestor of the new one.
push
be a fast-forward, by mergingBranch-name fast-forwards occur when the only thing you do is add new commits; but also when, if you've added new commits, you've also merged-in whatever new commits someone else added. That is, suppose your repo has this in it, after you've made one new commit:
o <-- master
/
...- o - o <-- origin/master
At this point, moving origin/master
"up and right" would be a fast-forward. However, someone else comes along and updates the other (origin
) repo, so you do a git fetch
and get a new commit from them. Your git moves your origin/master
label (in a fast-forward operation on your repo, as it happens):
o <-- master
/
...- o - o - o <-- origin/master
At this point, moving origin/master
to master
would not be a fast-forward, as it would abandon that one new commit.
You, however, can do a git merge origin/master
operation to make a new commit on your master
, with two parent commit IDs. Let's label this one M
(for merge):
o - M <-- master
/ /
...- o - o - o <-- origin/master
You can now git push
this back to origin
and ask them to set their master
—which you are calling origin/master
—equal to your (new) M
, because for them, that's now a fast-forward operation!
Note that you can also do a git rebase
, but let's leave that for a different stackoverflow posting. :-)
1In fact, git references always start out as individual files in various sub-directories, but if a reference doesn't get updated for a long while, it tends to get "packed" (along with all the other mostly-static references) into a single file full of packed references. This is just a time-saving optimization, and the key here is not to depend on the exact implementation, but rather to use git's rev-parse
and update-ref
commands to extract the current SHA-1 from a reference, or update a reference to contain a new SHA-1.