Version control for DOCX and PDF?

Jungle Hunter picture Jungle Hunter · Jul 21, 2010 · Viewed 27.4k times · Source

I've been playing around with git and hg lately and then suddenly it occurred to me that this kind of thing will be great for documents.

I've a document which I edit in DOCX and export as PDF. I tried using both git and hg to version control it and turns out with hg you end up tracking only binary and diff-ing isn't meaningful. Although with git I can meaningfully diff DOCX (haven't tried on PDF yet) I was wondering if there is a better way to do it than I'm doing it right now. (Ideally, not having to leave Word to diff will be the best solution.)

Answer

Will Dean picture Will Dean · Jul 21, 2010

There are two different concepts here - one is "can the version control system make some intelligent judgements about the contents of files?" - so that it can store just delta information between revisions (and do things like assign responsibility to individual parts of a file).

The other is 'do I have a file comparison tool which is useful for the types of files I have in the version control system'. Version control systems tend to come with file comparison tools which are inferior to dedicated alternatives. But they can pretty much always be linked to better diff programs - either for all file types or specific ones.

So it's common to use, for example, Beyond Compare as a general compare tool, with Word as a dedicated Word document comparer.

Different version control systems differ as to how good people perceive them to be at handling 'binaries', but that's often as much to do with handling huge files and providing exclusive locking as it is to do with file comparison.