Revert only a single file of a pushed commit

Aki T picture Aki T · Nov 29, 2018 · Viewed 7.9k times · Source

Below is the the pushed commits history.

Commit      Changed Files
Commit 1|  (File a, File b, File c)
Commit 2|  (File a, File b, File c)
Commit 3|  (File a, File b, File c)
Commit 4|  (File a, File b, File c)
Commit 5|  (File a, File b, File c)

I want to revert the changes happened to File b of Commit 3. But i want the changes happened to this file in commit 4 and 5.

Answer

torek picture torek · Nov 29, 2018

In Git, each commit saves a snapshot—that is, the state of every file—rather than a set of changes.

However, every commit—well, almost every commit—also has a parent (previous) commit. If you ask Git, e.g., what happened in commit a123456?, what Git does is find the parent of a123456, extract that snapshot, then extract a123456 itself, and then compare the two. Whatever is different in a123456, that's what Git will tell you.

Since each commit is a complete snapshot, it's easy to revert to a particular version of a particular file in a particular commit. You just tell Git: Get me file b.ext from commit a123456, for instance, and now you have the version of file b.ext from commit a123456. That's what you seemed to be asking, and hence what the linked question and current answer (as of when I am typing this) provide. You edited your question, though, to ask for something rather different.

A bit more background

I now have to guess at the actual hash IDs for each of your five commits. (Every commit has a unique hash ID. The hash ID—the big ugly string of letters and numbers—is the "true name" of the commit. This hash ID never changes; using it always gets you that commit, as long as that commit exists.) But they're big ugly strings of letters and numbers, so instead of guessing, say, 8858448bb49332d353febc078ce4a3abcc962efe, I'll just call your "commit 1" D, your "commit 2" E, and so on.

Since most commits have a single parent, which lets Git hop backwards from the newer commits to the older ones, let's arrange them in a line with these backwards arrows:

... <-D <-E <-F <-G <-H   <--master

A branch name like master really just holds the hash ID of the latest commit on that branch. We say that the name points to the commit, because it has the hash ID that lets Git retrieve the commit. So master points to H. But H has the hash ID of G, so H points to its parent G; G has the hash ID of F; and so on. That's how Git manages to show you commit H in the first place: you ask Git Which commit is master? and it says H. You ask Git to show you H and Git extracts both G and H and compares them, and tells you what changed in H.

What you've asked for

I want to revert the changes happened to File b of Commit 3 [F]. But I want the changes [that] happened to this file in commit 4 and 5 [G and H].

Note that this version of the file probably does not appear in any commit. If we take the file as it appears in commit E (your Commit 2), we get one without the changes from F, but it does not have the changes from G and H added to it. If we go ahead and do add the changes from G to it, that's probably not the same as what's in G; and if we add the changes from H to it after that, that's probably not the same as what's in H.

Obviously, then, this is going to be a little harder.

Git provides git revert to do this, but it does too much

The git revert command is designed to do this sort of thing, but it does it on a commit-wide basis. What git revert does, in effect, is to figure out what changed in some commit, and then (try to) undo just those changes.

Here is a fairly good way to think of git revert: It turns the commit—the snapshot—into a set of changes, just like every other Git command that views a commit, by comparing the commit to its parent. So for commit F, it would compare it to commit E, finding the changes to files a, b, and c. Then—here's the first tricky bit—it reverse-applies those changes to your current commit. That is, since you're actually on commit H, git revert can take whatever is in all three files—a, b, and c—and (try to) undo exactly what got done to them in commit E.

(It's actually a bit more complicated, because this "undo the changes" idea also takes into account the other changes made since commit F, using Git's three-way merge machinery.)

Having reverse-applied all the changes from some particular commit, git revert now makes a new commit. So if you did a git revert <hash of F> you'd get a new commit, which we can call I:

...--D--E--F--G--H--I   <-- master

in which F's changes to all three files are backed out, producing three versions that probably aren't in any of the earlier commits. But that's too much: you only wanted F's changes to b backed-out.

The solution is thus to do a little less, or do too much and then fix it up

We already described the git revert action as: find the changes, then reverse-apply the same changes. We can do this manually, on our own, using a few Git commands. Let's start with git diff or the short-hand version, git show: both of these turn snapshots into changes.

  • With git diff, we point Git to the parent E and the child F and ask Git: What's the difference between these two? Git extracts the files, compares them, and shows us what changed.

  • With git show, we point Git to commit F; Git finds the parent E on its own, and extracts the files and compares them and shows us what changed (prefixed with the log message). That is, git show commit amounts to git log (for just that one commit) followed by git diff (from that commit's parent, to that commit).

The changes that Git will show are, in essence, instructions: they tell us that if we start with the files that are in E, remove some lines, and insert some other lines, we'll get the files that are in F. So we just need to reverse the diff, which is easy enough. In fact, we have two ways to do this: we can swap the hash IDs with git diff, or we can use the -R flag to either git diff or git show. Then we'll get instructions that say, in essence: If you start with the files from F, and apply these instructions, you'll get the files from E.

Of course, these instructions will tell us to make changes to all three files, a, b, and c. But now we can strip away the instructions for two of the three files, leaving only the instructions we want.

There are, again, multiple ways to do this. The obvious one is to save all the instructions in a file, and then edit the file:

git show -R hash-of-F > /tmp/instructions

(and then edit /tmp/instructions). There's an even easier way, though, which is to tell Git: only bother showing instructions for particular files. The file we care about is b, so:

git show -R hash-of-F -- b > /tmp/instructions

If you check the instructions file, it should now describe how to take what's in F and un-change b to make it look like what's in E instead.

Now we just need to have Git apply these instructions, except that instead of the file from commit F, we'll use the file from the current commit H, which is already sitting in our work-tree ready to be patched. The Git command that applies a patch—a set of instructions on how to change some set of files—is git apply, so:

git apply < /tmp/instructions

should do the trick. (Note, though, that this will fail if the instructions say to change lines in b that were subsequently changed by commits G or H. This is where git revert is smarter, because it can do that whole "merge" thing.)

Once the instructions are successfully applied, we can look over the file, make sure it looks right, and use git add and git commit as usual.

(Side note: you can do this all in one pass using:

git show -R hash -- b | git apply

And, git apply has its own -R or --reverse flag as well, so you can spell this:

git show hash -- b | git apply -R

which does the same thing. There are additional git apply flags, including -3 / --3way that will let it do fancier things, much like git revert does.)

The "do too much, then back some of it out" approach

The other relatively easy way to deal with this is to let git revert do all of its work. This will, of course, back out the changes to the other files, that you didn't want backed-out. But we showed at the top how ridiculously easy it is to get any file from any commit. Suppose, then, that we let Git undo all the changes in F:

git revert hash-of-F

which makes new commit I that backs out everything in F:

...--D--E--F--G--H--I   <-- master

It's now trivial to git checkout the two files a and c from commit H:

git checkout hash-of-H -- a c

and then make a new commit J:

...--D--E--F--G--H--I--J   <-- master

The file b in both I and J is the way we want it, and the files a and c in J are the way we want them—they match the files a and c in H—so we're now pretty much done, except for the annoying extra commit I.

We can get rid of I in several ways:

  • Use git commit --amend when making J: this pushes I out of the way, by making commit J use commit H as J's parent. Commit I still exists, it's just abandoned. Eventually (after roughly a month) it expires and really goes away.

    The commit graph, if we do this, looks like this:

                       I   [abandoned]
                      /
    ...--D--E--F--G--H--J   <-- master
    
  • Or, git revert has a -n flag that tells Git: Do the revert, but don't commit the result. (This also enables doing a revert with a dirty index and/or work-tree, though if you make sure you start with a clean checkout of commit H you don't need to worry about what this means.) Here we'll start with H, revert F, then tell Git get files a and c back from commit H:

    git revert -n hash-of-F
    git checkout HEAD -- a c
    git commit

    Since we're on commit H when we do this, we can use the name HEAD to refer to the copies of a and c that are in commit H.

(Git being Git, there are a half dozen additional ways to do this; I'm just using the ones that we're illustrating in general here.)