How to perform better document version control on Excel files and SQL schema files

Marcus Thornton picture Marcus Thornton · Jun 13, 2013 · Viewed 88.2k times · Source

I am in charge of several Excel files and SQL schema files. How should I perform better document version control on these files?

I need to know the part modified (different part) in these files and keep all the versions for reference. Currently I am appending the time stamp on the file name, but I found it seemed to be inefficient.

Is there a way or good practice to do better document version control?

By the way, editors send me the files via email.

Answer

1615903 picture 1615903 · Jun 14, 2013

The answer I have written here can be applied in this case. A tool called xls2txt can provide human-readable output from .xls files. So in short, you should put this to your .gitattributes file:

*.xls diff=xls

And in the .git/config:

[diff "xls"]
    binary = true
    textconv = /path/to/xls2txt

Of course, I'm sure you can find similar tools for other file types as well, making git diff a very useful tool for office documents. This is what I currently have in my global .gitconfig:

[diff "xls"]
    binary = true
    textconv = /usr/bin/py_xls2txt
[diff "pdf"]
    binary = true
    textconv = /usr/bin/pdf2txt
[diff "doc"]
    binary = true
    textconv = /usr/bin/catdoc
[diff "docx"]
    binary = true
    textconv = /usr/bin/docx2txt

The Pro Git book has a good chapter on the subject: 8.2 Customizing Git - Git Attributes