Serializing an object as UTF-8 XML in .NET

Garry Shutler picture Garry Shutler · Oct 5, 2010 · Viewed 149.1k times · Source

Proper object disposal removed for brevity but I'm shocked if this is the simplest way to encode an object as UTF-8 in memory. There has to be an easier way doesn't there?

var serializer = new XmlSerializer(typeof(SomeSerializableObject));

var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream, System.Text.Encoding.UTF8);

serializer.Serialize(streamWriter, entry);

memoryStream.Seek(0, SeekOrigin.Begin);
var streamReader = new StreamReader(memoryStream, System.Text.Encoding.UTF8);
var utf8EncodedXml = streamReader.ReadToEnd();

Answer

Jon Skeet picture Jon Skeet · Oct 5, 2010

No, you can use a StringWriter to get rid of the intermediate MemoryStream. However, to force it into XML you need to use a StringWriter which overrides the Encoding property:

public class Utf8StringWriter : StringWriter
{
    public override Encoding Encoding => Encoding.UTF8;
}

Or if you're not using C# 6 yet:

public class Utf8StringWriter : StringWriter
{
    public override Encoding Encoding { get { return Encoding.UTF8; } }
}

Then:

var serializer = new XmlSerializer(typeof(SomeSerializableObject));
string utf8;
using (StringWriter writer = new Utf8StringWriter())
{
    serializer.Serialize(writer, entry);
    utf8 = writer.ToString();
}

Obviously you can make Utf8StringWriter into a more general class which accepts any encoding in its constructor - but in my experience UTF-8 is by far the most commonly required "custom" encoding for a StringWriter :)

Now as Jon Hanna says, this will still be UTF-16 internally, but presumably you're going to pass it to something else at some point, to convert it into binary data... at that point you can use the above string, convert it into UTF-8 bytes, and all will be well - because the XML declaration will specify "utf-8" as the encoding.

EDIT: A short but complete example to show this working:

using System;
using System.Text;
using System.IO;
using System.Xml.Serialization;

public class Test
{    
    public int X { get; set; }

    static void Main()
    {
        Test t = new Test();
        var serializer = new XmlSerializer(typeof(Test));
        string utf8;
        using (StringWriter writer = new Utf8StringWriter())
        {
            serializer.Serialize(writer, t);
            utf8 = writer.ToString();
        }
        Console.WriteLine(utf8);
    }


    public class Utf8StringWriter : StringWriter
    {
        public override Encoding Encoding => Encoding.UTF8;
    }
}

Result:

<?xml version="1.0" encoding="utf-8"?>
<Test xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <X>0</X>
</Test>

Note the declared encoding of "utf-8" which is what we wanted, I believe.