GIT - Remove old reflog entries

codekandis picture codekandis · Mar 2, 2018 · Viewed 11.3k times · Source

After a lot of rebasing a repository to our latest needs our reflog is full of commits and orphan branches. We reached the final state of our reorganization.

While there're branches and commits left with a lot of binary data the repository grew multiple times of its origin size we decided to purge all the old reflog entries and data.

I was digging in the manual but didn't get much smarter experimenting with git-reflog expire

This is an example of the log (shortened)

-> <sha1> [development] ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
-> <sha1> [master] ...
-> <sha1-old> ...
| <sha1-old> ...
| <sha1-old> ...
| <sha1-old> ...
| <sha1-old> ...
| <sha1-old> ...
| <sha1-old> ...
| <sha1-old> ...
-> <sha1-old> ...

As you can see below the master branch there is the old commits / branches stating the repository before the rebase.

We expect to clear the reflog to have the repository look like

-> <sha1> [development] ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
| <sha1> ...
-> <sha1> [master] ...

In order we expect to reduce the disk space used by the repository.

How can I accomlish that?


Edit (2019-03-02 12:20)

Please do not mention to delete and re-clone the repository. This is not what I'm looking for.


Edit (2019-03-02 12:30)

What I tried so far but what not worked

git reflog expire --expire=all

Nothing happened so I tried to be clever and invoked the garbage collector

git gc --aggressive

But bogus.

Answer

torek picture torek · Mar 2, 2018

You need, specifically, the --expire-unreachable option:

git reflog expire --expire=90.days.ago --expire-unreachable=now --all

for instance.

What's the difference?

A reflog is a log for a reference (hence the name "reflog" :-) ). A reference or ref is a name beginning with refs/, such as refs/heads/master, which is how the branch name master is really stored. There is one extra reflog, for HEAD itself, which (since it doesn't start with refs/) is technically not a reference by the definition I linked in the gitglossary, but then, the glossary definition goes on to say that there are some special references that don't start with refs/, so either they're confused, or I am. :-)

Anyway, the point of a reference is to store a hash ID (or in the case of the special HEAD reference, to store the name of another reference). A hash ID is a value. You can update a reference, which changes the stored value—so over time, the single name has taken on multiple different values. There's the current value master, and then there is the one from one change ago, master@{1}, and from two changes ago, master@{2}, and so on. (For consistency, you can spell the current value master@{0} if you like.) This is all spelled out in the gitrevisions documentation.

The reflog is where Git keeps the previous values. The reflog stores not only the previous value, but also the computer's clock-time when the value was changed—so Git can handle syntax like master@{3.days.ago} to find whichever entry, master@{0} or master@{1} or master@{2} or whatever, represents the value master had three days ago. ("Three days" means 3 24-hour days: 72 hours and no minutes and no seconds ago, or precisely 259200 seconds ago. If you changed master several times yesterday, you may need to be more precise than just master@{yesterday}.)

Anyway, so, suppose that the current value of master is 1234567... (some big ugly hash ID), and that master@{1} is 8888888... while master@{2} is 3333333.... So far, they all seem rather alike. But they aren't necessarily so:

          1234567   <-- master
            /
...--o--8888888   [master@{1}]
      \
    3333333   [master@{2}]

The difference between master@{1} and master@{2} here—well, besides their values and the numbers inside the curly braces {}—the important difference to git reflog expire is that we can find master@{1} by starting from master (1234567) and working backwards. If we start at master and go back one commit, we come to master@{1}. If we go back another step we arrive at boring commit o whose number we don't even know; we skip right over commit 3333333.

Specifically, in this case, master@{2} is unreachable from the current (1234567) value of master. So its expiry is controlled by the --expire-unreachable argument, not by the --expire argument.

If you don't choose a particular value, git reflog will use the configured default, if you have configured one. In the absence of a configured default, the default defaults are 90 days for reachable entries and 30 days for unreachable entries. So:

--expire=90.days.ago --expire-unreachable=30.days.ago

is the default, unless you've changed your own defaults. If you override one default on the command line, you leave the other default alone.

Rebase makes a lot of unreachables

Your question starts with an important point: you did a lot of rebasing. Rebase works by copying commits, then switching branch names to use the new (and presumably improved) commits. The old ones are still around, and are invariably unreachable from the new branch tip:

          A'-B'-C'  <-- branch
         /
...--o--o
         \
          A--B--C   [branch@{1}]

where A--B--C is the original chain (the old and icky commits) and A'-B'-C' are the shiny new copies that you want. Since the connections always go backwards, the old ones are always unreachable from the new branch tips, even if they're reachable from some other references.