In my C# app, XML data may contain arbitrary element text that's already been pre-processed, so that (among other things) illegal characters have been converted to their escaped (xml character entity encoded) form.
Example: <myElement>this & that</myElement>
has been converted to <myElement>this & that</myElement>
.
The problem is that when I use XmlTextWriter to save the file, the '&' is getting re-escaped into <myElement>this &amp; that</myElement>
. I don't want that extra & in the string.
Another example: <myElement>• bullet</myElement>
, my processing changes it to <myElement>• bullet</myElement>
which gets saved to <myElement>&#8226; bullet</myElement>
. All I want output to the file is the <myElement>• bullet</myElement>
form.
I've tried various options on the various XmlWriters, etc but can't seem to get the raw strings to get output correctly. And why can't the XML parser recognize & not rewrite already a valid escapes?
update: afer more debugging, I found that element text strings (actually all strings including element tags, names, attributes, etc. ) get encoded whenever they get copied into the .net xml object data (CDATA being an exception) by an internal class called XmlCharType under System.Xml. So the problem has nothing to do with the XmlWriters. It looks like the best way to solve the problem is to un-escape the data when it's output, either by using something like:
string output = System.Net.WebUtility.HtmlDecode(xmlDoc.OuterXml);
Which will probably evolve into a custom XmlWriter in order to preserve formatting, etc.
Thanks all for the helpful suggestions.
Ok, here's the solution I came up with:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.Versioning;
using System.Text;
namespace YourName {
// Represents a writer that makes it possible to pre-process
// XML character entity escapes without them being rewritten.
class XmlRawTextWriter : System.Xml.XmlTextWriter {
public XmlRawTextWriter(Stream w, Encoding encoding)
: base(w, encoding) {
}
public XmlRawTextWriter(String filename, Encoding encoding)
: base(filename, encoding) {
}
public override void WriteString(string text) {
base.WriteRaw(text);
}
}
}
then using that as you would XmlTextWriter:
XmlRawTextWriter rawWriter = new XmlRawTextWriter(thisFilespec, Encoding.UTF8);
rawWriter.Formatting = Formatting.Indented;
rawWriter.Indentation = 1;
rawWriter.IndentChar = '\t';
xmlDoc.Save(rawWriter);
This works without having to un-encode or hack around the encoding functionality.