I have a colleague who claims that git pull
is harmful, and gets upset whenever someone uses it.
The git pull
command seems to be the canonical way to update your local repository. Does using git pull
create problems? What problems does it create? Is there a better way to update a git repository?
By default, git pull
creates merge commits which add noise and complexity to the code history. In addition, pull
makes it easy to not think about how your changes might be affected by incoming changes.
The git pull
command is safe so long as it only performs fast-forward merges. If git pull
is configured to only do fast-forward merges and when a fast-forward merge isn't possible, then Git will exit with an error. This will give you an opportunity to study the incoming commits, think about how they might affect your local commits, and decide the best course of action (merge, rebase, reset, etc.).
With Git 2.0 and newer, you can run:
git config --global pull.ff only
to alter the default behavior to only fast-forward. With Git versions between 1.6.6 and 1.9.x you'll have to get into the habit of typing:
git pull --ff-only
However, with all versions of Git, I recommend configuring a git up
alias like this:
git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'
and using git up
instead of git pull
. I prefer this alias over git pull --ff-only
because:
origin/*
branches that no longer exist upstream.git pull
git pull
isn't bad if it is used properly. Several recent changes to Git have made it easier to use git pull
properly, but unfortunately the default behavior of a plain git pull
has several problems:
git pull
These problems are described in greater detail below.
By default, the git pull
command is equivalent to running git fetch
followed by git merge @{u}
. If there are unpushed commits in the local repository, the merge part of git pull
creates a merge commit.
There is nothing inherently bad about merge commits, but they can be dangerous and should be treated with respect:
Of course there is a time and a place for merges, but understanding when merges should and should not be used can improve the usefulness of your repository.
Note that the purpose of Git is to make it easy to share and consume the evolution of a codebase, not to precisely record history exactly as it unfolded. (If you disagree, consider the rebase
command and why it was created.) The merge commits created by git pull
do not convey useful semantics to others—they just say that someone else happened to push to the repository before you were done with your changes. Why have those merge commits if they aren't meaningful to others and could be dangerous?
It is possible to configure git pull
to rebase instead of merge, but this also has problems (discussed later). Instead, git pull
should be configured to only do fast-forward merges.
Suppose someone rebases a branch and force pushes it. This generally shouldn't happen, but it's sometimes necessary (e.g., to remove a 50GiB log file that was accidentally comitted and pushed). The merge done by git pull
will merge the new version of the upstream branch into the old version that still exists in your local repository. If you push the result, pitch forks and torches will start coming your way.
Some may argue that the real problem is force updates. Yes, it's generally advisable to avoid force pushes whenever possible, but they are sometimes unavoidable. Developers must be prepared to deal with force updates, because they will happen sometimes. This means not blindly merging in the old commits via an ordinary git pull
.
There's no way to predict what the working directory or index will look like until git pull
is done. There might be merge conflicts that you have to resolve before you can do anything else, it might introduce a 50GiB log file in your working directory because someone accidentally pushed it, it might rename a directory you are working in, etc.
git remote update -p
(or git fetch --all -p
) allows you to look at other people's commits before you decide to merge or rebase, allowing you to form a plan before taking action.
Suppose you are in the middle of making some changes and someone else wants you to review some commits they just pushed. git pull
's merge (or rebase) operation modifies the working directory and index, which means your working directory and index must be clean.
You could use git stash
and then git pull
, but what do you do when you're done reviewing? To get back to where you were you have to undo the merge created by git pull
and apply the stash.
git remote update -p
(or git fetch --all -p
) doesn't modify the working directory or index, so it's safe to run at any time—even if you have staged and/or unstaged changes. You can pause what you're doing and review someone else's commit without worrying about stashing or finishing up the commit you're working on. git pull
doesn't give you that flexibility.
A common Git usage pattern is to do a git pull
to bring in the latest changes followed by a git rebase @{u}
to eliminate the merge commit that git pull
introduced. It's common enough that Git has some configuration options to reduce these two steps to a single step by telling git pull
to perform a rebase instead of a merge (see the branch.<branch>.rebase
, branch.autosetuprebase
, and pull.rebase
options).
Unfortunately, if you have an unpushed merge commit that you want to preserve (e.g., a commit merging a pushed feature branch into master
), neither a rebase-pull (git pull
with branch.<branch>.rebase
set to true
) nor a merge-pull (the default git pull
behavior) followed by a rebase will work. This is because git rebase
eliminates merges (it linearizes the DAG) without the --preserve-merges
option. The rebase-pull operation can't be configured to preserve merges, and a merge-pull followed by a git rebase -p @{u}
won't eliminate the merge caused by the merge-pull. Update: Git v1.8.5 added git pull --rebase=preserve
and git config pull.rebase preserve
. These cause git pull
to do git rebase --preserve-merges
after fetching the upstream commits. (Thanks to funkaster for the heads-up!)
git pull
doesn't prune remote tracking branches corresponding to branches that were deleted from the remote repository. For example, if someone deletes branch foo
from the remote repo, you'll still see origin/foo
.
This leads to users accidentally resurrecting killed branches because they think they're still active.
git up
instead of git pull
Instead of git pull
, I recommend creating and using the following git up
alias:
git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'
This alias downloads all of the latest commits from all upstream branches (pruning the dead branches) and tries to fast-forward the local branch to the latest commit on the upstream branch. If successful, then there were no local commits, so there was no risk of merge conflict. The fast-forward will fail if there are local (unpushed) commits, giving you an opportunity to review the upstream commits before taking action.
This still modifies your working directory in unpredictable ways, but only if you don't have any local changes. Unlike git pull
, git up
will never drop you to a prompt expecting you to fix a merge conflict.
git pull --ff-only --all -p
The following is an alternative to the above git up
alias:
git config --global alias.up 'pull --ff-only --all -p'
This version of git up
has the same behavior as the previous git up
alias, except:
-p
argument, which is passed to fetch
) that may change in future versions of GitWith Git 2.0 and newer you can configure git pull
to only do fast-forward merges by default:
git config --global pull.ff only
This causes git pull
to act like git pull --ff-only
, but it still doesn't fetch all upstream commits or clean out old origin/*
branches so I still prefer git up
.