Html inside XML. Should I use CDATA or encode the HTML

alberto picture alberto · Sep 9, 2009 · Viewed 90.7k times · Source

I am using XML to share HTML content. AFAIK, I could embed the HTML either by:

  • Encoding it: I don't know if it is completely safe to use. And I would have to decode it again.

  • Use CDATA sections: I could still have problems if the content contains the closing tag "]]>" and certain hexadecimal characters, I believe. On the other hand, the XML parser would extract the info transparently for me.

Which option should I choose?

UPDATE: The xml will be created in java and passed as a string to a .net web service, were it will be parsed back. Therefore I need to be able to export the xml as a string and load it using "doc.LoadXml(xmlString);"

Answer

Ned Batchelder picture Ned Batchelder · Sep 9, 2009

The two options are almost exactly the same. Here are your two choices:

<html>This is &lt;b&gt;bold&lt;/b&gt;</html>

<html><![CDATA[This is <b>bold</b>]]></html>

In both cases, you have to check your string for special characters to be escaped. Lots of people pretend that CDATA strings don't need any escaping, but as you point out, you have to make sure that "]]>" doesn't slip in unescaped.

In both cases, the XML processor will return your string to you decoded.