Proper object disposal removed for brevity but I'm shocked if this is the simplest way to encode an object as UTF-8 in memory. There has to be an easier way doesn't there?
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream, System.Text.Encoding.UTF8);
serializer.Serialize(streamWriter, entry);
memoryStream.Seek(0, SeekOrigin.Begin);
var streamReader = new StreamReader(memoryStream, System.Text.Encoding.UTF8);
var utf8EncodedXml = streamReader.ReadToEnd();
No, you can use a StringWriter
to get rid of the intermediate MemoryStream
. However, to force it into XML you need to use a StringWriter
which overrides the Encoding
property:
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding => Encoding.UTF8;
}
Or if you're not using C# 6 yet:
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding { get { return Encoding.UTF8; } }
}
Then:
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
string utf8;
using (StringWriter writer = new Utf8StringWriter())
{
serializer.Serialize(writer, entry);
utf8 = writer.ToString();
}
Obviously you can make Utf8StringWriter
into a more general class which accepts any encoding in its constructor - but in my experience UTF-8 is by far the most commonly required "custom" encoding for a StringWriter
:)
Now as Jon Hanna says, this will still be UTF-16 internally, but presumably you're going to pass it to something else at some point, to convert it into binary data... at that point you can use the above string, convert it into UTF-8 bytes, and all will be well - because the XML declaration will specify "utf-8" as the encoding.
EDIT: A short but complete example to show this working:
using System;
using System.Text;
using System.IO;
using System.Xml.Serialization;
public class Test
{
public int X { get; set; }
static void Main()
{
Test t = new Test();
var serializer = new XmlSerializer(typeof(Test));
string utf8;
using (StringWriter writer = new Utf8StringWriter())
{
serializer.Serialize(writer, t);
utf8 = writer.ToString();
}
Console.WriteLine(utf8);
}
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding => Encoding.UTF8;
}
}
Result:
<?xml version="1.0" encoding="utf-8"?>
<Test xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<X>0</X>
</Test>
Note the declared encoding of "utf-8" which is what we wanted, I believe.