What can cause git to mess with character encoding?

Samuel Rossille picture Samuel Rossille · May 16, 2012 · Viewed 19.8k times · Source

Edit: git does not mess with character encoding. This is still here to share knowlege and avoid others making the same mistake.


The context: My enterprise uses an svn repository. I'm using git-svn as a client to interact with this repository. All text files in the project are (and must be) encoded with windows default encoding (cp-....). I use git-extensions, and sometimes the command line to pilot git.

What I did: During the last 3 days, I was working on a new feature, and I did a number of local commits. Finally i squashed all these commits into a single one using an interactive rebase, then i used git svn dcommit to push everything on the svn repository in a single commit.

What happened then: A collegue told me that all accents were messed up in the files that I modified, and in the new files after my commit. I had already commited text files with accents in the same repository with my installation of git + svn before, and it's the first time I face this issue.

My investigation:I did the following things to investigate: opened the files with notepad++, and tried the most current encodings (including windows default and UTF-8) to view them: none of them could display accents properly, and different accents are always rendered by the same sequence of strange glyphs.

The temporary workaround:I quickly created a revert commit with git extension and "dcommited" it.

The question:My enterprise svn repository is OK, but now i have the two following problems to solve:

  1. Understand what happened with the characters with accents
  2. Retrieve my work from the SVN history and commit it in a proper way (if possible without reviewing manually all the characters with accents)

Can anybody provide some clues (i'm rather new to git) ?

Answer

Samuel Rossille picture Samuel Rossille · May 18, 2012

And now let's reveal the painful truth (painful for my ego, not for git users): I did mess with the accents, not git.

I could have just removed the question which let's wrongly think that git can mess up with accents, but considering the number of upvotes, i think than a lot of people do the same mistake that i did, so I have chosen to answer my own question to establish the truth, and maybe help people in the same case:

  1. Git does not touch to characters other than line breaks.
  2. I broke the accents before commiting, and i did not noticed it because i did not pay enough attention. To do so, i edited some of the files with eclipse. Eclipse did not recognize the encoding and the accents were all replace by a weird byte sequence on save. That's all.

Thanks again to Dmitry Pavlenko for giving me indications on how to investigate this problem.

+1 to "git reflog"

Happy accent fixing ;=)