What is the advantage of git lfs?

Sanster picture Sanster · Feb 23, 2016 · Viewed 9.6k times · Source

Github has a limit on push large file. So if you want to push a large file to your repo, you have to use Git LFS.

I know it's a bad idea to add binary file in git repo. But if I am using gitlab on my server and there is no limit of file size in a repo, and I don't care the repo size to be super large on my server. In this condition, what's the advantage of git lfs?git clone or git checkout will be faster?

Answer

Matthieu Moy picture Matthieu Moy · Feb 23, 2016

One specificity of Git (and other distributed systems) compared to centralized systems is that each repository contains the whole history of the project. Suppose you create a 100 MB file, modify it 100 times in a way that doesn't compress well. You'll end up with a 10 GB repository. This means that each clone will download 10 GB of data, eat 10 GB of disk space on each machine on which you're making a clone. What's even more frustrating: you'd still have to download these 10 GB of data even if you git rm the big files.

Putting big files in a separate system like git-lfs allow you to store only pointers to each version of the file in the repository, hence each clone will only download a tiny piece of data for each revision. The checkout will download only the version you are using, i.e. 100 MB in the example above. As a result, you would be using disk space on the server, but saving a lot of bandwidth and disk space on the client.

In addition to this, the algorithm used by git gc (internally, git repack) does not always work well with big files. Recent versions of Git made progress in this area and it should work reasonably well, but using a big repository with big files in it may eventually get you in trouble (like not having enough RAM to repack your repository).