I am using xmlTextWriter to create the xml.
writer.WriteStartElement("book");
writer.WriteAttributeString("author", "j.k.rowling");
writer.WriteAttributeString("year", "1990");
writer.WriteString("&");
writer.WriteEndElement();
But now i need to write '&' but xmlTextWriter will automatically write this one as "&"; So is there any work around to do this?
I am creating xml by reading the doc file.So if I read "-" then in xml i need to write "&ndash";.So while writing it's written as "&";ndash.
So, for example, if I am trying to write a node with the text good-bad
, I actually need to write my XML such as <node>good–bad</node>
. This is a requirement of my project.
In a proper XML file, you cannot have a standalone &
character unless it is an escape character. So if you need an XML node to contain good–bad
, then it will have to be encoded as good&ndash;bad
. There is no workaround as anything different would not be valid XML. The only way to make it work is to just write the XML file as a plain text how you want it, but then it could not be read by an XML parser as it is not proper XML.
Here's a code example of my suggested workaround (you didn't specify a language, so I am showing you in C#, but Java should have something similar):
using(var sw = new StreamWriter(stream))
{
// other code to write XML-like data
sw.WriteLine("<node>good–bad</node>");
// other code to write XML-like data
}
As you discovered, another option is to use the WriteRaw()
method on XmlTextWriter
(in C#) will write an unencoded string, but it does not change the fact it is not going to be a valid XML file when it is done.
But as I mentioned, if you tried to read this with an XML Parser, it would fail because &ndash
is not a valid XML character entity so it is not valid XML.
–
is an HTML character entity, so escaping it in an XML should not normally be necessary.
In the XML language, &
is the escape character, so &
is appropriate string representation of &. You cannot use just a & character because the & character has a special meaning and therefore a single & character would be misinterpreted by the parser/
You will see similar behavior with the <, >, ", and' characters. All have meaning within the XML language so if you need to represent them as text in a document.
Here's a reference to all of the character entities in XML (and HTML) from Wikipedia. Each will always be represented by the escape character and the name (>
, <
, "
, '
)