First, let's see the code:
//The encoding of utf8.txt is UTF-8
StreamReader reader = new StreamReader(@"C:\\utf8.txt", Encoding.UTF8, true);
while (reader.Peek() > 0)
{
//What is the encoding of lineFromTxtFile?
string lineFromTxtFile = reader.ReadLine();
}
As Joel said in his famous article:
If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly."
So here comes my question: what is the encoding of the string lineFromTxtFile? UTF-8(because it is from a text file encoded in UTF-8)? or UTF-16(because string in .NET is "Unicode"(UTF-16))?
Thanks.
All .Net string variables are encoded with Encoding.Unicode (UTF-16, little endian). Even better, because you know your text file is utf-8 and told your streamreader the correct encoding in the constructor, any special characters will be handled correctly.