Using git log to display files changed during merge

Blue picture Blue · Jun 14, 2016 · Viewed 7.6k times · Source

I’m executing the following command:

git log --name-only –pretty="format:%H %s" -- *.sql --grep="JIRA-154"

which returns results in the format:

[commitid1] [comment]
path/to/file1/file1.sql
path/to/file2/file2.sql
path/to/file3/file3.sql

[commitid2] [comment]
path/to/file2/file2.sql
path/to/file4/file4.sql

The output is redirected to a file and the format is exactly what I’m looking for, however merge commits are a problem. The files that have been changed as part of a merge are never listed. Instead I end up with something like the following:

[commitid3] [merge comment]
[commitid4] [comment]
path/to/file3/file3.sql

I’ve obviously misunderstood something here because I expect to see the files that changed during the merge listed. Is there a way to include these files in the output?

Answer

torek picture torek · Jun 14, 2016

TL;DR

Try adding the -m option to the git log options. This makes Git "split" each merge, so that it will diff the merge twice, once against each parent. Without this or some other similar option, git log finds the merges but then does not even look inside them at all.

Also, as ElpieKay commented, you need to put the --grep=<regexp> before the --. It may also be a good idea to write "*.sql", i.e., with quotes, to prevent your shell from expanding the asterisk itself (the details vary from one shell to another and depend on whether there are any *.sql files in your current working directory).

Long version

As Tim Biegeleisen said, the problem stems from the nature of a merge commit.

Normally, to show you what changed in a commit, Git runs a simple git diff parent self, where parent and self are the commit's parent, and the commit itself, respectively. Both git log and git show do this, in slightly different ways and under slightly different circumstances. The most obvious is that git show defaults to showing a diff every time, but git log only does a diff when given -p or one of the various diff control options such as --name-only.

Merges are different

A merge commit is a commit with two1 parents. This means that git log and git show would have to run two git diff commands.2 And in fact, git show does run two diffs, but then—by default—turns them into a combined diff, which shows only those files whose merge-commit version differs from both parents. But for whatever reason,3 git log does not do this by default.

Even when git log is showing diffs, though, it behaves particularly oddly (I might even say badly) on merges. While git log -p or git log --name-status runs a (single) diff on a regular commit, it does not run the diff at all on a commit with multiple visible parents, unless you force it to.

Using -m by itself always works. This flag essentially tells git log (and git show) to break up a merge into multiple separate "virtual commits". That is, if commit M is a merge with parents P1 and P2, then—for the purpose of the diff at least—Git acts as though there was a commit MP1 with parent P1, and a second commit MP2 with parent P2. You get two diffs (and two commit IDs in the diff headers).

Adding --first-parent tells git log to ignore the second (and any additional) parent of a merge, which leaves it with just one parent. This means git log won't follow the side branch at all. Hence you can use -m --first-parent, provided you're not interested in histories stemming from the other sides of merges. That gets you a single diff against just the first parent, instead of one diff per parent.

(Which parent is first? Well, it's the one that was your HEAD when you ran git merge. That's normally the "main line" of commits, i.e., the ones "on your branch". But if your group uses git pull casually, you probably do not want to ignore the other side of merges, as git pull turns other people's main-line work into "foxtrot merges" of small side branches.)

Combined diffs, again

Besides -m, you can supply -c or --cc (note that -c has one dash while --cc has two4) to git log to get it to produce a combined diff, just like git show. But, as with all combined diffs, this ignores files that match up between the merge commit and either parent. That is, given the same merge M again, this time Git compares M vs P1, and M vs P2. For any file F where M:F is the same as either P1:F or P2:F, Git shows nothing at all.

As it turns out, this is usually what you want. If file F in commit M matches file F in one of the two parent commits, that means the file came from that parent. The fact that F in P1 may not match F in P2 is usually not interesting: any change in F in either P1 or P2 is probably a result of some earlier change in history, and that's where we should take note of it, rather than at merge M.

That is the logic behind combined diffs, anyway. It's not applicable in all circumstances, which is why -m exists: to "split up" the merge into its constituent parts.


1Two or more, actually, but "more" is unusual; most merge commits have exactly two parents. A merge commit with more than two parents is called an octopus merge.

2Both git log and git show have most of git diff built in to them, so that they do not actually have to run additional commands, but it works out the same either way.

3I don't know the reason, and I only learned of this particular behavior when I went through the git log source, trying to explain why git log --name-status had not shown something.

4This is because --cc is a long option, and in GNU option parsing, all long options like name-only or cc get two dashes, while all short (one letter) options like p get one dash.