Flexible vs static branching (Git vs Clearcase/Accurev)

jbhope picture jbhope · Apr 18, 2009 · Viewed 14.4k times · Source

My question is about the way in which Git handles branches: whenever you branch from a commit, this branch won’t ever receive changes from the parent branch unless you force it with a merge.

But in other systems such us Clearcase or Accurev, you can specify how branches get filled with some sort of inheritance mechanism: I mean, with Clearcase, using a config_spec, you can say “get all the files modified on branch /main/issue001 and then continue with the ones on /main or with this specific baseline”.

In Accurev you also have a similar mechanism which let’s streams receive changes from upper branches (streams how they call them) without merging or creating a new commit on the branch.

Don’t you miss this while using Git? Can you enumerate scenarios where this inheritance is a must?

Thanks

Update Please read VonC answer below to actually focus my question. Once we agree "linear storage" and DAG based SCMs have different capabilities, my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they worth?

Answer

VonC picture VonC · Apr 18, 2009

To understand why Git does not offer some kind of what you are referring to as an "inheritance mechanism" (not involving a commit), you must first understand one of the core concepts of those SCMs (Git vs. ClearCase for instance)

  • ClearCase uses a linear version storage: each version of an element (file or directory) is linked in a direct linear relationship with the the previous version of the same element.

  • Git uses a DAG - Directed Acyclic Graph: each "version" of a file is actually part of a global set of changes in a tree that is itself part of a commit. The previous version of that must be found in a previous commit, accessible through a single directed acyclic graph path.

In a linear system, a config spec can specify several rules for achieving the "inheritance" you see (for a given file, first select a certain version, and if not present, then select another version, and if not present, then select a third, and so on).

The branch is a fork in a linear history a given version for a given select rule (all the other select rules before that one still apply, hence the "inheritance" effect)

In a DAG, a commit represents all the "inheritance" you will ever get; there is no "cumulative" selection of versions. There is only one path in this graph to select all the files you will see at this exact point (commit).
A branch is just a new path in this graph.

To apply, in Git, some other versions, you must either:

But since Git is a DAG-based SCM, it will always result in a new commit.

What you are "losing" with Git is some kind of "composition" (when you are selecting different versions with different successive select rules), but that would not be practical in a DVCS (as in "Distributed"): when you are making a branch with Git, you need to do so with a starting point and a content clearly defined and easily replicated to other repositories.

In a purely central VCS, you can define your workspace (in ClearCase, your "view", either snapshot or dynamic) with whatever rules you want.


unknown-google adds in the comment (and in his question above):

So, once we see the two models can achieve different things (linear vs DAG), my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they worth it?

When it comes to "real-life scenario" in term of selection rules, what you can do in a linear model is to have several selection rules for the same set of files.

Consider this "config spec" (i.e. "configuration specification" for selection rules with ClearCase):

element /aPath/... aLabel3 -mkbranch myNewBranch
element /aPath/... aLabel2 -mkbranch myNewBranch

It selects all the files labelled 'aLabel2' (and branch from there), except for those labelled 'aLabel3' - and branch from there - (because that rule precedes the one mentioning 'aLabel2').

Is it worth it?

No.

Actually, the UCM flavor of ClearCase (the "Unified Configuration Management" methodology included with the ClearCase product, and representing all the "best practices" deduced from base ClearCase usage) does not allow it, for reasons of simplificity. A set of files is called a "component", and if you want to branch for a given label (known as a "baseline"), that would be translated like this following config spec:

element /aPath/... .../myNewBranch
element /aPath/... aLabel3 -mkbranch myNewBranch
element /aPath/... /main/0 -mkbranch myNewBranch

You have to pick one starting point (here, 'aLabel3') and go from there. If you want also the files from 'aLabel2', you will make a merge from all the 'aLabel2' files to the ones in 'myNewBranch'.

That is a "simplification" you do not have to make with a DAG, where each node of the graph represents a uniquely defined "starting point" for a branch, whatever is the set of files involved.

Merge and rebase are enough to combine that starting point with other versions of a given set of files, in order to achieve the desired "composition", while keeping that particular history in isolation in a branch.

The general goal is to reason in "coherent Version Control operations applied to a coherent component". A "coherent" set of files is one in a well-defined coherent state:

  • if labelled, all its files are labelled
  • if branched, all its files will branch from the same unique starting point

That is easily done in a DAG system; it can be more difficult in a linear system (especially with "Base ClearCase" where the "config spec" can be tricky), but it is enforced with the UCM methodology of that same linear-based tool.

Instead of achieving that "composition" through a "private selection rule trick" (with ClearCase, some select rule order), you achieve it only with VCS operations (rebase or merge), which leave a clear trace for everyone to follow (as opposed to a config spec private to a developer, or shared amongst some but not all developers). Again, it enforces a senses of coherency, as opposed to a "dynamic flexibility", that you may have a hard time to reproduce later on.

That allows you to leave the realm of VCS (Version Control System) and enter the realm of SCM (Software Configuration Management), which is mainly concerned with "reproducibility". And that (SCM features) can be achieved with a linear-based or a DAG-based VCS.