When to use git subtree?

Lernkurve picture Lernkurve · Sep 5, 2015 · Viewed 35.7k times · Source

What problem does git subtree solve? When and why should I use that feature?

I've read that it is used for repository separation. But why would I not just create two independent repositories instead of sticking two unrelated ones into one?

This GitHub tutorial explains how to perform Git subtree merges.

I kind of know how to use it, but not when (use cases) and why, and how it relates to git submodule. I'd use submodules when I have a dependency on another project or library.

Answer

Matthew Sanders picture Matthew Sanders · Nov 7, 2015

You should be careful to note explicitly what you are talking about when you use the term 'subtree' in the context of git as there are actually two separate but related topics here:

git-subtree and git subtree merge strategy.

The TL;DR

Both subtree related concepts effectively allow you to manage multiple repositories in one. In contrast to git-submodule where only metadata is stored in the root repository, in the form of .gitmodules, and you must manage the external repositories separately.

More Details

git subtree merge strategy is basically the more manual method using the commands you referenced.

git-subtree is a wrapper shell script to facilitate a more natural syntax. This is actually still a part of contrib and not fully integrated into git with the usual man pages. The documentation is instead stored along side the script.

Here is the usage info:

NAME
----
git-subtree - Merge subtrees together and split repository into subtrees


SYNOPSIS
--------
[verse]
'git subtree' add   -P <prefix> <commit>
'git subtree' add   -P <prefix> <repository> <ref>
'git subtree' pull  -P <prefix> <repository> <ref>
'git subtree' push  -P <prefix> <repository> <ref>
'git subtree' merge -P <prefix> <commit>
'git subtree' split -P <prefix> [OPTIONS] [<commit>]

I have come across a pretty good number of resources on the subject of subtrees, as I was planning on writing a blog post of my own. I will update this post if I do, but for now here is some relevant information to the question at hand:

Much of what you are seeking can be found on this Atlassian blog by Nicola Paolucci the relevant section below:

Why use subtree instead of submodule?

There are several reasons why you might find subtree better to use:

  • Management of a simple workflow is easy.
  • Older version of git are supported (even before v1.5.2).
  • The sub-project’s code is available right after the clone of the super project is done.
  • subtree does not require users of your repository to learn anything new, they can ignore the fact that you are using subtree to manage dependencies.
  • subtree does not add new metadata files like submodules does (i.e. .gitmodule).
  • Contents of the module can be modified without having a separate repository copy of the dependency somewhere else.

In my opinion the drawbacks are acceptable:

  • You must learn about a new merge strategy (i.e. subtree).
  • Contributing code back upstream for the sub-projects is slightly more complicated.
  • The responsibility of not mixing super and sub-project code in commits lies with you.

I would agree with much of this as well. I would recommend checking out the article as it goes over some common usage.

You may have noticed that he has also written a follow up here where he mentions an important detail that is left off with this approach...

git-subtree currently fails to include the remote!

This short sightedness is probably due to the fact that people often add a remote manually when dealing with subtrees, but this isn't stored in git either. The author details a patch he has written to add this meta data to the commit that git-subtree already generates. Until this makes it into the official git mainline you could do something similar by modifying the commit message or storing it in another commit.

I also find this blog post very informative as well. The author adds a third subtree method he calls git-streeto the mix. The article is worth a read as he does a pretty good job of comparing the three approaches. He gives his personal opinion of what he does and doesn't like and explains why he created the third approach.

Extras

Closing Thoughts

This topic shows both the power of git and the segmentation that can occur when a feature just misses the mark.

I personally have built a distaste for git-submodule as I find it more confusing for contributors to understand. I also prefer to keep ALL of my dependencies managed within my projects to facilitate an easily reproducible environment without trying to manage multiple repositories. git-submodule, however, is much more well known currently so it is obviously good to be aware of it and depending on your audience that may sway your decision.