How to write '&' in xml?

Giri picture Giri · Feb 22, 2013 · Viewed 15.6k times · Source

I am using xmlTextWriter to create the xml.

writer.WriteStartElement("book"); 
writer.WriteAttributeString("author", "j.k.rowling"); 
writer.WriteAttributeString("year", "1990");
writer.WriteString("&");
writer.WriteEndElement();

But now i need to write '&' but xmlTextWriter will automatically write this one as "&amp"; So is there any work around to do this?

I am creating xml by reading the doc file.So if I read "-" then in xml i need to write "&ndash";.So while writing it's written as "&amp";ndash.

So, for example, if I am trying to write a node with the text good-bad, I actually need to write my XML such as <node>good&ndash;bad</node>. This is a requirement of my project.

Answer

psubsee2003 picture psubsee2003 · Feb 22, 2013

In a proper XML file, you cannot have a standalone & character unless it is an escape character. So if you need an XML node to contain good&ndash;bad, then it will have to be encoded as good&amp;ndash;bad. There is no workaround as anything different would not be valid XML. The only way to make it work is to just write the XML file as a plain text how you want it, but then it could not be read by an XML parser as it is not proper XML.

Here's a code example of my suggested workaround (you didn't specify a language, so I am showing you in C#, but Java should have something similar):

using(var sw = new StreamWriter(stream))
{
    // other code to write XML-like data
    sw.WriteLine("<node>good&ndash;bad</node>");
    // other code to write XML-like data
}

As you discovered, another option is to use the WriteRaw() method on XmlTextWriter (in C#) will write an unencoded string, but it does not change the fact it is not going to be a valid XML file when it is done.

But as I mentioned, if you tried to read this with an XML Parser, it would fail because &ndash is not a valid XML character entity so it is not valid XML.

&ndash; is an HTML character entity, so escaping it in an XML should not normally be necessary.

In the XML language, & is the escape character, so &amp; is appropriate string representation of &. You cannot use just a & character because the & character has a special meaning and therefore a single & character would be misinterpreted by the parser/

You will see similar behavior with the <, >, ", and' characters. All have meaning within the XML language so if you need to represent them as text in a document.

Here's a reference to all of the character entities in XML (and HTML) from Wikipedia. Each will always be represented by the escape character and the name (&gt;, &lt;, &quot;, &apos;)