What is the point of 'git submodule init'?

sampablokuper picture sampablokuper · Jun 5, 2017 · Viewed 47.4k times · Source

Background

To populate a repository's submodules, one typically invokes:

git submodule init
git submodule update

In this usage, git submodule init seems to do only one thing: populate .git/config with information that is already in .gitmodules.

What is the point of that?

Couldn't git submodule update simply use the information from .gitmodules? This would avoid both:

  • an unnecessary command (git submodule init); and
  • an unnecessary duplication of data (.gitmodules content into .git/config).

Question

Either:

  • there are use-cases for git submodule init that I do not know (in which case, please enlighten me!); or else
  • git submodule init is cruft that could be deprecated in Git without any harm.

Which of these is true?

Answer

Nigiri picture Nigiri · Oct 5, 2017

Imagine the repository has 10 submodules and you are interested in only two submodules of these. In such a case, you may want to get updates from only these two submodules from the remote repository from time to time. git init works well for this, because once you execute the command git init for these two submodules, git submodule update --remote applies only to them.


Appended two workflows demo.

Workflow1: Submodules are libraries which several projects use.

I think this is one of the common use cases.

You just cloned "my-project".

git clone https://example.com/demo/my-project

And the surface of its structure is like below.

Enter image description here

The contents of .gitmodules

[submodule "lib1"]
    path = lib1
    url = https://example.com/demo/lib1
[submodule "lib2"]
    path = lib2
    url = https://example.com/demo/lib2
[submodule "lib3"]
    path = lib3
    url = https://example.com/demo/lib3
[submodule "lib4"]
    path = lib4
    url = https://example.com/demo/lib4

You want to refactor the code code1.js which references lib1 and lib2 which means you don't need to clone and checkout lib3 and lib4. So you just run the below command.

git submodule init lib1 lib2

Now let's see the contents of .git/config

...
[submodule "lib1"]
    active = true
    url = https://example.com/demo/lib1
[submodule "lib2"]
    active = true
    url = https://example.com/demo/lib2

This means something like "Ready to update lib1 and lib2 from example.com/demo".

At this point, lib1 and lib2 directories are empty. You can clone and checkout lib1 and lib2 with one command:

git submodule update

Now you are able to refactor code1.js without import errors.

Submodules are just references to certain commits. So when you want to update libraries to new versions, you have to update the references. You can do it by the below command.

git submodule update --remote

Now you can see how useful it is to only initialize the submodules you need.

Workflow 2: Each submodule is a project and one big top project includes them.

I'm a fan of this.

You clone "main-project".

git clone https://example.com/demo/main-project

And the surface of its structure is like below.

Enter image description here

You can see a directory named "shared". There is a rule in this workflow: if you want to use shared codes of main-project in your project, you have to create the project as a submodule of main-project.

I like to put entity classes in shared directory like below.

Enter image description here

Back to the submodule workflow, the contents of .gitmodules is like the following.

[submodule "sub-project1"]
    path = sub-project1
    url = https://example.com/demo/sub-project1
[submodule "sub-project2"]
    path = sub-project2
    url = https://example.com/demo/sub-project2
[submodule "sub-project3"]
    path = sub-project3
    url = https://example.com/demo/sub-project3
[submodule "sub-project4"]
    path = sub-project4
    url = https://example.com/demo/sub-project4

This time you want to refactor some code in the shared directory of the main-project and you know that only sub-project1 and sub-project2 reference shared code, which means you don't need to clone and checkout sub-project3 and sub-project4. So you just run the command below.

git submodule init sub-project1 sub-project2

And like I mentioned in workflow1, you need to run the command below to clone and checkout them.

git submodule update

Would I do git submodule update --remote in this case? Or do I even have to init and update submodules to refactor code in the shared directory? Yes, because you have to run tests in submodules after refactoring the shared code and if any update of submodules is committed and pushed to the remote repository while you are refactoring, then you need to get it by git submodule update --remote.