When should pdf files be tracked in a Git repository and when not

uli_1973 picture uli_1973 · Jul 21, 2013 · Viewed 10.6k times · Source

I am developing a LateX package (http://www.openlilylib.org/lilyglyphs) which contains a number of small PDF files. Currently there are only a few dozens of them but as the package and its user base grows there will probably hundreds of them (but unlikely more than 1000).

The PDFs are typically only a few KB in size, but I don't know whether to track them in the Git repository. The files are subject to change at any time, but probably not too often.
Usually one is told not to track binary files which can't be diffed, but I also have read that this doesn't really matter with smaller files and a smaller overall volume. I think in the end the PDFs will sum up to not more than a few MB in total.

The package will be available as a download or through the Git repository which I prefer because using the package quite naturally leads to contributing ...
Currently when cloning the Git repository one has to rebuild the pdfs using Python and the LilyPond notation software so the stakes are rather high - which is why I would like to have the pdfs directly in the repo.

Any thoughts?


EDIT in response to answers/comments:

The pdf files are generated from the sources in the repository, which is why I'm reluctant to track them in Git.
But:

  • The pdfs are necessary to use the package so the user needs to have them
  • To generate the pdfs one needs Python as well as LilyPond, and both of them are not necessary to use the package. So I feel it is a too big burden to require someone to install two programs just to install my package.
    I don't see a problem requiring someone who decides to clone a Git repo to run an install script, but the software dependencies are maybe too high?
  • Currently generating the pdfs finishes in reasonable time because there are only a few dozens. But with a growing number of files this time could become inacceptable.

The pdf files change when they are updated/corrected. This won't happen often, and I think this is covered by tracking the source code. But the pdfs will also change whenever there is a new version of LilyPond available, which may be every two to four weeks. So while the source remains the same the pdfs will change regularely - which is a clear indicator against tracking them with Git.
On the other hand we are talking about (possibly) a few hundred files of a few KB each, so I don't know if it's worth bothering about the issue at all.

Answer

platforms picture platforms · Jul 21, 2013

If the documents don't change, there is no reason to track their changes in git. No revisions, no need for revision control.

But if they do change over time, and someone may need to consult the old document versions for any reason, consider these questions:

  1. Is it impossible or impractical to recreate the old versions of the documents?
  2. Is there any underlying data outside of version control that has changed, or is it still in the same state?
  3. Is the data in the documents tied to source code releases?

If the answers to these questions are yes, then they may be good candidates for version control under git.