How to best detect encoding in XML file?

Peter Lillevold picture Peter Lillevold · Mar 12, 2009 · Viewed 10.1k times · Source

To load XML files with arbitrary encoding I have the following code:

Encoding encoding;
using (var reader = new XmlTextReader(filepath))
{
    reader.MoveToContent();
    encoding = reader.Encoding;
}

var settings = new XmlReaderSettings { NameTable = new NameTable() };
var xmlns = new XmlNamespaceManager(settings.NameTable);
var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default, 
    encoding);
using (var reader = XmlReader.Create(filepath, settings, context))
{
    return XElement.Load(reader);
}

This works, but it seems a bit inefficient to open the file twice. Is there a better way to detect the encoding such that I can do:

  1. Open file
  2. Detect encoding
  3. Read XML into an XElement
  4. Close file

Answer

Peter Lillevold picture Peter Lillevold · Mar 12, 2009

Ok, I should have thought of this earlier. Both XmlTextReader (which gives us the Encoding) and XmlReader.Create (which allows us to specify encoding) accepts a Stream. So how about first opening a FileStream and then use this with both XmlTextReader and XmlReader, like this:

using (var txtreader = new FileStream(filepath, FileMode.Open))
{
    using (var xmlreader = new XmlTextReader(txtreader))
    {
        // Read in the encoding info
        xmlreader.MoveToContent();
        var encoding = xmlreader.Encoding;

        // Rewind to the beginning
        txtreader.Seek(0, SeekOrigin.Begin);

        var settings = new XmlReaderSettings { NameTable = new NameTable() };
        var xmlns = new XmlNamespaceManager(settings.NameTable);
        var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default,
                 encoding);

        using (var reader = XmlReader.Create(txtreader, settings, context))
        {
            return XElement.Load(reader);
        }
    }
}

This works like a charm. Reading XML files in an encoding independent way should have been more elegant but at least I'm getting away with only one file open.