When to CDATA vs. Escape & Vice Versa?

Carlos S picture Carlos S · Jun 9, 2009 · Viewed 11.5k times · Source

I'm creating XML documents with values fetched from a DB. Occasionally due to a legacy implementation, I'll pullback a value that contains a char that's invalid when not properly escaped (& for example).

So the question becomes, should I CDATA or Escape? Are certain situations more appropriate for one vs. the other?

Examples:

<Email>foo&[email protected]</Email>

I'd lean towards CDATA here.

<Name>Bob & Tom</Name>

I'd lean towards escaping here.

I want to avoid blindly CDATA'ing every time, but from a performance perspective it seems like that's the logical choice. That will be always faster than looking for an invalid char, and if it exists then wrap.

Thoughts?

Answer

Eddie picture Eddie · Jun 9, 2009

CDATA is primarily useful, IMO, for human readability. As far as a machine is concerned, there's no difference between CDATA and escaped text other than the length, at most. Perhaps the escaped version will take a little bit longer to process, but I say perhaps, because this shouldn't be a significant factor unless your application is mostly IO-bound.

Are people likely to be reading the XML? If not, just let the XML parser do what it does and don't worry about CDATA vs escaped text. If people will be reading this XML, then perhaps CDATA can be the better choice.

If you're going to have an XML element whose value is XML, then for this case, CDATA may be the better choice.

For more information, see for example the XML FAQ's question, When should I use a CDATA Marked Section?