What is the encoding of the string get from StreamReader.ReadLine()

jjooeell picture jjooeell · Nov 11, 2011 · Viewed 8.7k times · Source

First, let's see the code:

//The encoding of utf8.txt is UTF-8
StreamReader reader = new StreamReader(@"C:\\utf8.txt", Encoding.UTF8, true);
while (reader.Peek() > 0)
{
    //What is the encoding of lineFromTxtFile?
    string lineFromTxtFile = reader.ReadLine();
}

As Joel said in his famous article:

If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly."

So here comes my question: what is the encoding of the string lineFromTxtFile? UTF-8(because it is from a text file encoded in UTF-8)? or UTF-16(because string in .NET is "Unicode"(UTF-16))?

Thanks.

Answer

Joel Coehoorn picture Joel Coehoorn · Nov 11, 2011

All .Net string variables are encoded with Encoding.Unicode (UTF-16, little endian). Even better, because you know your text file is utf-8 and told your streamreader the correct encoding in the constructor, any special characters will be handled correctly.