How can I tell what happened in a Git commit with two parents that did not merge in the changes from the second parent?

DAC picture DAC · Jan 22, 2015 · Viewed 7.7k times · Source

In Gitk I can see a team member's commit (X) that has two parents, the first parent is his own previous commit (A), the other parent contains lots of other people commits (1 through 5). After his merge all changes made by other people (1 through 5 and others) are no longer present at X, B, C, etc...

A------------
              \
               X - B - C 
              /
1--2--3--4--5
           /
e--r--j--k
     /
l--m

If I diff commit X to commit A it shows no differences, if I diff commit X to commit 5 it shows all the missing changes. Also, at commit X, B, or C git log does not show changes that were made to files in commits 1 through 5. However, if I do git log --full-history then history does show the changes that were made in 1 through 5, but those changes are not still present in the actual file and history does not show them being being undone. So git log --full-history seems to contradict the current file contents.

I talked to the user who made commit X. He says he did not do a reset or rebase and he says he hasn't reverted any commits during the time in question. However, he says that he does sometimes do a pull origin master that results in everyone else's changes getting put in his index or working tree as if he had made those changes and not the actual authors of those changes. He says when that happens he does a fresh clone and does not push anything from that local repo to master because he believes Git has done something wrong.

Are the two things related (bad pull and bad merge)?

How can I tell exactly what happened so that we can avoid this in the future?

And what causes Git to sometimes put changes pulled from origin master to be placed in the local working directory or index as if they were local changes?

Answer

Dietrich Epp picture Dietrich Epp · Jan 22, 2015

However, he says that he does sometimes do a pull origin master that results in everyone else's changes getting put in his index or working tree as if he had made those changes and not the actual authors of those changes.

It sounds like he was getting merge conflicts but does not understand what they are. This is an extremely common problem, and unfortunately, we don't know a good way to avoid it (switching back to SVN doesn't avoid it, for example).

How could this happen, exactly?

Let's call your developers Alice and Bob. Alice made commits 1-5, and Bob made commits A and X. Here is a plausible history.

  1. Bob makes commit A.

  2. Alice makes commits 1-5, and pushes them to the central repository.

  3. Bob tries to push A, but can't, because his repository is out of date.

    $ git push
     ! [rejected]        master -> master (non-fast-forward)
    
  4. Bob then he does what you told him to do: he pulls first. However, he gets a merge conflict because commit A and commits 1-5 touch some of the same code.

    $ git pull
    Auto-merging file.txt
    CONFLICT (content): Merge conflict in file.txt
    Automatic merge failed; fix conflicts and then commit the result.
    
  5. Bob sees other people's changes in his working directory, and doesn't understand why the changes are there.

    $ git status
        both modified:   file.txt
    
  6. He thinks Git is doing something wrong, when in fact, Git is asking him to resolve a merge conflict. He tries to check out a fresh copy, but gets an error:

    $ git checkout HEAD file.txt  
    error: path 'file.txt' is unmerged
    
  7. Since it doesn't work, he tries -f:

    $ git checkout -f HEAD file.txt
    warning: path 'file.txt' is unmerged
    
  8. Success! He commits and pushes.

    $ git commit
    $ git push
    

The part where it gets harder

There are a lot of git tools out there. Seriously. Visual Studio and Xcode both come with Git integration, there are several other GUIs, and there are even multiple command-line clients. People are also sloppy with the way they describe how they use Git, and most developers are not quite comfortable enough with how Git works outside of the "pull commit push" workflow.

There was an excellent paper on this very subject not too long ago (I'm having a hard time finding it). Some of the conclusions were (forgive my memory):

  • Most developers don't really know how to use source control, except for a few really simple commands (commit, push).

  • When source control doesn't behave the way developers expect, they resort to tactics such as copy-pasting some command they don't quite understand to "fix things", adding the -f flag, or erasing the repository and starting again with a clean copy.

  • On development teams, it is often the case that only the lead developers really know what is going on in the repo.

So this is really an educational challenge.

I think the key lesson here that Bob needs to learn is that git pull is really just git fetch and git merge, and that you can get merge conflicts, and you need to act in a very conscientious and purposeful manner when resolving merges. This applies even when there are no reported conflicts... but let's not blow Bob's mind too much for now!

The other key lesson here is that lead developers need to take the time to ensure that everyone on the team can use source control correctly, and understands how pulling, pushing, branching, and merging are all related. This is a great opportunity for a lunchtime lecture: put together some slides, buy pizza, and talk about how Git works.