Unescaping XML entities using XmlReader in .NET?

Philippe Leybaert picture Philippe Leybaert · Mar 14, 2011 · Viewed 12.1k times · Source

I'm trying to unescape XML entities in a string in .NET (C#), but I don't seem to get it to work correctly.

For example, if I have the string AT&T, it should be translated to AT&T.

One way is to use HttpUtility.HtmlDecode(), but that's for HTML.

So I have two questions about this:

  1. Is it safe to use HttpUtility.HtmlDecode() for decoding XML entities?

  2. How do I use XmlReader (or something similar) to do this? I have tried the following, but that always returns an empty string:

    static string ReplaceEscapes(string text)
    {
        StringReader reader = new StringReader(text);
    
        XmlReaderSettings settings = new XmlReaderSettings();
    
        settings.ConformanceLevel = ConformanceLevel.Fragment;
    
        using (XmlReader xmlReader = XmlReader.Create(reader, settings))
        {
            return xmlReader.ReadString();
        }
    }
    

Answer

adrianbanks picture adrianbanks · Mar 14, 2011

HTML escaping and XML are closely related. as you have said, HttpUtility has both HtmlEncode and HtmlDecode methods. These will also operate on XML, as there are only a few entities that need escaping: <,>,\,' and & in both HTML and XML.

The downside of using the HttpUtility class is that you need a reference to the System.Web dll, which also brings in a lot of other stuff that you probably don't want.

Specifically for XML, the SecurityElement class has an Escape method that will do the encoding, but does not have a corresponding Unescape method. You therefore have a few options:

  1. use the HttpUtility.HtmlDecode() and put up with a reference to System.Web
  2. roll your own decode method that takes care of the special characters (as there are only a handful - look at the static constructor of SecurityElement in Reflector to see the full list)

  3. use a (hacky) solution like:

.

    public static string Unescape(string text)
    {
        XmlDocument doc = new XmlDocument();
        string xml = string.Format("<dummy>{0}</dummy>", text);
        doc.LoadXml(xml);
        return doc.DocumentElement.InnerText;
    }

Personally, I would use HttpUtility.HtmlDecode() if I already had a reference to System.Web, or roll my own if not. I don't like your XmlReader approach as it is Disposable, which usually indicate that it is using resources that need to be disposed, and so may be a costly operation.