I am trying to copy a byte stream from a database, encode it and finally display it on a web page. However, I am noticing different behavior encoding the content in different ways (note: I am using the "Western European" encoding which has a Latin character set and does not support chinese characters):
var encoding = Encoding.GetEncoding(1252 /*Western European*/);
using (var fileStream = new StreamReader(new MemoryStream(content), encoding))
{
var str = fileStream.ReadToEnd();
}
Vs.
var encoding = Encoding.GetEncoding(1252 /*Western European*/);
var str = new string(encoding.GetChars(content));
If the content contains Chinese characters than the first block of code will produce a string like "D$教学而设计的", which is incorrect because the encoding shouldn't support those characters, while the second block will produce "D$æ•™å¦è€Œè®¾è®¡çš„" which is correct as those are all in the Western European character set.
What is the explanation for this difference in behavior?
The StreamReader
constructor will look for BOMs in the stream and set its encoding from them, even if you pass a different encoding.
It sees the UTF8 BOM in your data and correctly uses UTF8.
To prevent this behavior, pass false
as the third parameter:
var fileStream = new StreamReader(new MemoryStream(content), encoding, false)