Short version: Do you need to preserve-merges only if you explicitly merged after you did a local commit? What exactly happens otherwise? Does it reapply your committed code to the merged branch?
Please explain when it is useful to git pull --rebase --preserve-merges
vs. a regular git pull --rebase
?
I read about an issue with git pull --rebase
here:
http://notes.envato.com/developers/rebasing-merge-commits-in-git/
That could cause code changes to be duplicated.
I read here: When will `git pull --rebase` get me in to trouble?
That it only happens if you basically rebase after some the commits have been pushed.
So I am not sure I understand when I would need git pull --rebase --preserve-merges
and if it's ever bad to use vs. git pull --rebase
.
Technically—and I claim this is a bit stupid of git, the pull
script (it's a shell script) should just do this for you—you have to run git pull --rebase=preserve
rather than attempting to use git pull --rebase --preserve-merges
. (Or, as I noted in a comment on Vlad Nikitin's answer, you can set branch.name.rebase
to preserve
to get the same effect automatically.)
In other words, you should never run git pull --rebase --preserve-merges
as it (incorrectly) passes --preserve-merges
to the fetch
step, instead of to the merge
-or-rebase
step. However, you can run git pull --rebase=preserve
.
The question of when (and whether) to use any kind of rebase, whether merge-preserving or not, is more a matter of opinion. Which means it really does not go well on stackoverflow in the first place. :-)
Still, I'll make one claim here: you should only rebase if you know (in a sort of general sense) what you are doing,1 and if you do know what you are doing, you would probably prefer a merge-preserving rebase as a general rule, although by the time you've decided that rebasing is a good idea, you will probably find that a history that has its own embedded branch-and-merge-points is not necessarily the correct "final rewritten history".
That is, if it's appropriate to do a rebase at all, it's at least fairly likely that the history to be rebased is itself linear, so that the preserve-vs-flatten question is moot anyway.
Here's a drawing of part of a commit graph, showing two named branches, mainline
and experiment
. The common base for mainline
and experiment
is commit node A
, and mainline
has a commit G
that is not on the experiment
branch:
...--o--A-------------G <-- mainline
\
\ .-C-.
B E--F <-- experiment
\_D_/
Note that the experiment
branch has a branch-and-merge within it too, though: the base for these two branches is B
, one branch holds commit C
, and the other branch holds commit D
. These two (unnamed) branches shrink back to a single thread of development at merge commit E
, and then commit F
sits atop the merge commit and is the tip of branch experiment
.
Here's what happens if you are on experiment
and run git rebase mainline
:
$ git rebase mainline
First, rewinding head to replay your work on top of it...
Applying: B
Applying: C
Applying: D
Applying: F
Here's what is now in the commit graph:
...--o--A--G <-- mainline
\
B'-C'-D'-F' <-- experiment
The "structural branch" that used to be there on branch experiment
is gone. The rebase
operation copied all the changes I'd made in commits B
, C
, D
, and F
; these became the new commits B'
, C'
, D'
, and F'
. (Commit E
was a pure merge with no changes and did not require copying. I have not tested what happens if I rebase a merge with embedded changes, either to resolve conflicts or, as some call it, an "evil merge".)
On the other hand, if I do this:
$ git rebase --preserve-merges mainline
[git grinds away doing the rebase; this takes a bit longer
than the "flattening" rebase, and there is a progress indicator]
Successfully rebased and updated refs/heads/experiment.
I get this graph instead:
...--o--A--G <-- mainline
\
\ .-C'.
B' E'-F' <-- experiment
\_D'/
This has preserved the merge, and hence the "internal branchiness", of experiment
. Is that good? Bad? Indifferent? Read the (very long) footnote!
1It's a good idea to learn "what rebase does" anyway, which in git (alas!) pretty much requires learning "how it does it" as well, at least on a medium-level. Basically, rebase makes copies of (the changes from your earlier) commits, which you then apply to (your or someone else's) later commits, making it "seem like" you did the work in some other order. A simple example: two developers, let's say Alice and Bob, are both working on the same branch. Let's say that Marketing has asked for a feature code-named Strawberry, and both Alice and Bob are doing some work to implement strawberry
, both on a branch named strawberry
.
Alice and Bob both run git fetch
to bring strawberry
over from origin
.
Alice discovers that file abc
needs some change to prepare for the new feature. She writes that and commits, but does not push yet.
Bob writes a description of the new feature, that changes file README
, but has no other effect. Bob commits his change and pushes.
Alice then updates file feat
to provide the actual feature. She writes and commits (separately) that, and is now ready to push. But, oh no, Bob beat her to it:
$ git push origin strawberry
...
! [rejected] strawberry -> strawberry (non-fast-forward)
Alice should then fetch the changes and look at them (not just blindly merge or rebase):
$ git fetch
...
$ git log origin/strawberry
(or using gitk
or whatever—I tend to use git lola
myself, and git show
individual commits if/as needed).
She can see from this that Bob only changed the README
, so her changes are definitely not affected either way. At this point, she can tell that it's safe to rebase her changes onto origin/strawberry
:
$ git rebase origin/strawberry
(note that there are no merges to preserve), which makes it look (in terms of git history) like she first waited for Bob to update the documentation, and only then actually started to implement the changes—which are still split into two separate commits so that it's easy to tell, later, whether the change to file abc
broke anything else. Those two separate commits are now adjacent, though, so it's easy to tell, later, that the point of the change to abc
was to enable the change to file feat
. And since the change to README
comes first, it's even more clear that the this was the point of the change to abc
. Not that it would be hard to tell even if Alice just did:
$ git merge origin/strawberry
instead, although that creates a merge commit whose only point seems to be to say "Alice started in on abc
before Bob finished updating README
, and finished feat
after", which is not really helpful.
In more complex cases, where Bob did more than just update the documentation, Alice might find that it's best to rearrange her own commits (probably more than two in this case) into a new, different linear history, so that some of Bob's changes (this time, probably more than one commit) are "in the middle", for instance, as if they had co-operated in real time (and who knows, maybe they did). Or she might find that it's better to keep her changes as a separate development line that merges, perhaps even more than once, with Bob's changes.
It's all a matter of what will provide the most useful information to someone(s)—possibly Alice and Bob, possibly other developers—in the future, if and when it becomes necessary to go back and look at the (apparent, if rebased, or actual if not) sequence of events. Sometimes each individual commit is useful information. Sometimes it's more useful to rearrange and combine commits, or drop some commits entirely: for instance, changes that proved to be a bad idea. (But consider leaving them in just for the value of pointing out "this was a bad idea so don't try it again in the future" as well!)