Given the following branch structure:
*------*---*
Master \
*---*--*------*
A \
*-----*-----*
B (HEAD)
If I want to merge my B changes (and only my B changes, no A changes) into master what is the difference between these two set of commands?
>(B) git rebase master
>(B) git checkout master
>(master) git merge B
>(B) git rebase --onto master A B
>(B) git checkout master
>(master) git merge B
I'm mainly interested in learning if code from Branch A could make it into master if I use the first way.
Bear with me for a while before I answer the question as asked. One the earlier answers is right but there are labeling and other relatively minor (but potentially confusing) issues, so I want to start with branch drawings and branch labels. Also, people coming from other systems, or maybe even just new to revision control and git, often think of branches as "lines of development" rather than "traces of history" (git implements them as the latter, rather than the former, so a commit is not necessarily on any specific "line of development").
First, there is a minor problem with the way you drew your graph:
*------*---*
Master \
*---*--*------*
A \
*-----*-----*
B (HEAD)
Here's the exact same graph, but with the labels drawn in differently and some more arrow-heads added (and I've numbered the commit nodes for use below):
0 <- 1 <- 2 <-------------------- master
\
3 <- 4 <- 5 <- 6 <------ A
\
7 <- 8 <- 9 <-- HEAD=B
Why this matters is that git is quite loose about what it means for a commit to be "on" some branch—or perhaps a better phrase is to say that some commit is "contained in" some set of branches. Commits cannot be moved or changed, but branch labels can and do move.
More specifically, a branch name like master
, A
, or B
points to one specific commit. In this case, master
points to commit 2, A
points to commit 6, and B
points to commit 9. The first few commits 0 through 2 are contained within all three branches; commits 3, 4, and 5 are contained within both A
and B
; commit 6 is contained only within A
; and commits 7 through 9 are contained only in B
. (Incidentally, multiple names can point to the same commit, and that's normal when you make a new branch.)
Before we proceed, let me re-draw the graph yet one more way:
0
\
1
\
2 <-- master
\
3 - 4 - 5
|\
| 6 <-- A
\
7
\
8
\
9 <-- HEAD=B
This just emphasizes that it's not a horizontal line of commits that matter, but rather the parent/child relationships. The branch label points to a starting commit, and then (at least the way these graphs are drawn) we move left, maybe also going up or down as needed, to find parent commits.
When you rebase commits, you're actually copying those commits.
There's one "true name" for any commit (or indeed any object in a git repository), which is its SHA-1: that 40-hex-digit string like 9f317ce...
that you see in git log
for instance. The SHA-1 is a cryptographic1 checksum of the contents of the object. The contents are the author and committer (name and email), time stamps, a source tree, and the list of parent commits. The parent of commit #7 is always commit #5. If you make a mostly-exact copy of commit #7, but set its parent to commit #2 instead of commit #5, you get a different commit with a different ID. (I've run out of single digits at this point—normally I use single uppercase letters to represent commit IDs, but with branches named A
and B
I thought that would be confusing. So I'll call a copy of #7, #7a, below.)
git rebase
doesWhen you ask git to rebase a chain of commits—such as commits #7-8-9 above—it has to copy them, at least if they're going to move anywhere (if they're not moving it can just leave the originals in place). It defaults to copying commits from the currently-checked-out branch, so git rebase
needs just two extra pieces of information:
When you run git rebase <upstream>
, you let git figure out both parts from one single piece of information. When you use --onto
, you get to tell git separately about the both parts: you still supply an upstream
but it doesn't compute the target from <upstream>
, it only computes the commits to copy from <upstream>
. (Incidentally, I think <upstream>
is not a good name, but it's what rebase uses and I don't have anything way better, so let's stick with it here. Rebase calls target <newbase>
, but I think target is a much better name.)
Let's take a look at these two options first. Both assume that you're on branch B
in the first place:
git rebase master
git rebase --onto master A
With the first command, the <upstream>
argument to rebase
is master
. With the second, it's A
.
Here's how git computes which commits to copy: it hands the current branch to git rev-list
, and it also hands <upstream>
to git rev-list
, but using --not
—or more precisely, with the equivalent of the two-dot exclude..include
notation. This means we need to know how git rev-list
works.
While git rev-list
is extremely complicated—most git commands end up using it; it's the engine for git log
, git bisect
, rebase
, filter-branch
, and so on—this particular case is not too hard: with the two-dot notation, rev-list
lists every commit reachable from the right-hand side (including that commit itself), excluding every commit reachable from the left-hand side.
In this case, git rev-list HEAD
finds all commits reachable from HEAD
—that is, almost all commits: commits 0-5 and 7-9—and git rev-list master
finds all commits reachable from master
, which is commit #s 0, 1, and 2. Subtracting 0-through-2 from 0-5,7-9 leaves 3-5,7-9. These are the candidate commits to copy, as listed by git rev-list master..HEAD
.
For our second command, we have A..HEAD
instead of master..HEAD
, so the commits to subtract are 0-6. Commit #6 doesn't appear in the HEAD
set, but that's fine: subtracting away something that's not there, leaves it not there. The resulting candidates-to-copy is therefore 7-9.
That still leaves us with figuring out the target of the rebase, i.e., where should copied commits land? With the second command, the answer is "the commit identified by the --onto
argument". Since we said --onto master
, that means the target is commit #2.
git rebase master
With the first command, though, we didn't specify a target directly, so git uses the commit identified by <upstream>
. The <upstream>
we gave was master
, which points to commit #2, so the target is commit #2.
The first command is therefore going to start by copying commit #3 with whatever minimal changes are needed so that its parent is commit #2. Its parent is already commit #2. Nothing has to change, so nothing changes, and rebase just re-uses the existing commit #3. It must then copy #4 so that its parent is #3, but the parent is already #3, so it just re-uses #4. Likewise, #5 is already good. It completely ignores #6 (that's not in the set of commits to copy); it checks #s 7-9 but they're all good as well, so the whole rebase ends up just re-using all the original commits. You can force copies anyway with -f
, but you didn't, so this whole rebase ends up doing nothing.
git rebase --onto master A
The second rebase command used --onto
to select #2 as its target, but told git to copy just commits 7-9. Commit #7's parent is commit #5, so this copy really has to do something.2 So git makes a new commit—let's call this #7a—that has commit #2 as its parent. The rebase moves on to commit #8: the copy now needs #7a as its parent. Finally, the rebase moves on to commit #9, which needs #8a as its parent. With all commits copied, the last thing rebase does is move the label (remember, labels move and change!). This gives a graph like this:
7a - 8a - 9a <-- HEAD=B
/
0 - 1 - 2 <-- master
\
3 - 4 - 5 - 6 <-- A
\
7 - 8 - 9 [abandoned]
git rebase --onto master A B
?This is almost the same as git rebase --onto master A
. The difference is that extra B
at the end. Fortunately, this difference is very simple: if you give git rebase
that one extra argument, it runs git checkout
on that argument first.3
In your first set of commands, you ran git rebase master
while on branch B
. As noted above, this is a big no-op: since nothing needs to move, git copies nothing at all (unless you use -f
/ --force
, which you didn't). You then checked out master
and used git merge B
, which—if it it is told to4—creates a new commit with the merge. Therefore Dherik's answer, as of the time I saw it at least, is correct here: The merge commit has two parents, one of which is the tip of branch B
, and that branch reaches back through three commits that are on branch A
and therefore some of what's on A
winds up being merged into master
.
With your second command sequence, you first checked out B
(you were already on B
so this was redundant, but was part of the git rebase
). You then had rebase copy three commits, producing the final graph above, with commits 7a, 8a, and 9a. You then checked out master
and made a merge commit with B
(see footnote 4 again). Again Dherik's answer is correct: the only thing missing is that the original, abandoned commits are not drawn-in and it's not as obvious that the new merged-in commits are copies.
1This only matters in that it's extraordinarily difficult to target a particular checksum. That is, if someone you trust tells you "I trust the commit with ID 1234567...", it's almost impossible for someone else—someone you may not trust so much—to come up with a commit that has that same ID, but has different contents. The chances of it happening by accident are 1 in 2160, which is much less likely than you having a heart attack while being struck by lightning while drowning in a tsunami while being abducted by space aliens. :-)
2The actual copy is made using the equivalent of git cherry-pick
: git compares the commit's tree with its parent's tree to get a diff, then applies the diff to the new parent's tree.
3This is actually, literally true at this time: git rebase
is a shell script that parses your options, then decides which kind of internal rebase to run: the non-interactive git-rebase--am
or the interactive git-rebase--interactive
. After it's figured out all the arguments, if there's the one left-over branch name argument, the script does git checkout <branch-name>
before starting the internal rebase.
4Since master
points to commit 2 and commit 2 is an ancestor of commit 9, this would normally not make a merge commit after all, but instead do what Git calls a fast-forward operation. You can instruct Git not to do these fast-forwards using git merge --no-ff
. Some interfaces, such as GitHub's web interface and perhaps some GUIs, may separate the different kinds of operations, so that their "merge" forces a true merge like this.
With a a fast-forward merge, the final graph for the first case is:
0 <- 1 <- 2 [master used to be here]
\
3 <- 4 <- 5 <- 6 <------ A
\
7 <- 8 <- 9 <-- master, HEAD=B
In either case, commits 1 through 9 are now on both branches, master
and B
. The difference, compared to the true merge is that, from the graph, you can see the history that includes the merge.
In other words, the advantage to a fast-forward merge is that it leaves no trace of what is otherwise a trivial operation. The disadvantage of a fast-forward merge is, well, that it leaves no trace. So the question of whether to allow the fast-forward is really a question of whether you want to leave an explicit merge in the history formed by the commits.