How to transform & nbsp; in XSLT

zosim picture zosim · Nov 24, 2011 · Viewed 11.9k times · Source

I have a following xslt:

<span><xsl:text disable-output-escaping="yes"><![CDATA[&nbsp;Some text]]></xsl:text></span>

After transformation I get:

<span>&amp;nbsp;Some text</span>

which is rendered as: & nbsp;Some text

I want to render & nbsp; as space character. I have tried also change disable-output-escaping to no, but it didn't helped.

thanks for help.

Answer

jasso picture jasso · Nov 24, 2011

The other two answers are correct, but I decided to take a little broader view to this subject.

What everyone should know about CDATA sections

CDATA section is just an alternative serialization form to an escaped XML string. This means that parser produces the same result for <span><![CDATA[ a & b < 2 ]]></span> and <span> a &amp; b &lt; 2 </span>. XML applications work on the parsed data, so an XML application should produce the same output for both example input elements.

Briefly: escaped data and un-escaped data inside a CDATA section mean exactly the same.

In this case

<span><xsl:text disable-output-escaping="yes"><![CDATA[&nbsp;Some text]]></xsl:text></span>

is identical to

<span><xsl:text disable-output-escaping="yes">&amp;nbsp;Some text</xsl:text></span>

Note that the & character has been escaped to &amp; in the latter serialization form.

What everyone should know about disable-output-escaping

disable-output-escaping is a feature that concerns the serialization only. In order to maintain the well-formedness of the serialized XML, XSLT processors escape & and < (and possibly other characters) by using entities. Their escaped forms are &amp; and &lt;. Escaped or not, the XML data is the same. XSLT elements <xsl:value-of> and <xsl:text> can have a disable-output-escaping attribute but it is generally advised to avoid using this feature. Reasons for this are:

  • XSLT processor may produce only a result tree, which is passed on to another process without serializing it between the processes. In such case disabling output escaping will fail because the XSLT processor is not able to control the serialization of the result tree.
  • An XSLT processor is not required to support disable-output-escaping attribute. In such case the processor must escape the output (or it may raise an error) so again, disabling output escaping will fail.
  • An XSLT processor must escape characters that cannot be represented as such in the encoding that is used for the document output. Using disable-output-escaping on such characters will result in error or escaped text so again, disabling output escaping will fail.
  • Disabling output escaping will easily lead to malformed or invalid XML so using it requires great attention or post processing of the output with non-XML tools.
  • disable-output-escaping is often misunderstood and misused and the same result could be achieved with more regular ways e.g. creating new elements as literals or with <xsl:element>.

In this case

<span><xsl:text disable-output-escaping="yes"><![CDATA[&nbsp;Some text]]></xsl:text></span>

should output

<span>&nbsp;Some text</span>

but the & character got escaped instead, so in this case the output escaping seems to fail.

What everyone should know about using entities

If an XML document contains an entity reference, the entity must be declared, if not, the document is not valid. XML has only 5 pre-defined entities. They are:

  • &amp; for &
  • &lt; for <
  • &gt; for >
  • &quot; for "
  • &apos; for '

All other entity references must be defined either in an internal DTD of the document or in an external DTD that the document refers to (directly or indirectly). Therefore blindly adding entity references to an XML document might result in invalid documents. Documents with (X)HTML DOCTYPE can use several entities (like &nbsp;) because the XHTML DTD refers to a DTD that contains their definition. The entities are defined in these three DTDs: http://www.w3.org/TR/html4/HTMLlat1.ent , http://www.w3.org/TR/html4/HTMLsymbol.ent and http://www.w3.org/TR/html4/HTMLspecial.ent .

An entity reference does not always get replaced with its replacement text. This could happen for example if the parser has no net connection to retrieve the DTD. Also non-validating parsers do not need to include the replacement text. In such cases the data represented by the entity is "lost". If the entity gets replacement works, there will be no signs in the parsed data model that the XML serialization had any entity references at all. The data model will be the same if one uses entities or their replacement values. Briefly: entities are only an alternative way to represent the replacement text of the entity reference.

In this case the replacement text of &nbsp; is &#160; (which is same than &#xA0; using hexadecimal notation). Instead of trying to output the &nbsp; entity, it will be easier and more robust to just use the solution suggested by @phihag. If you like the readability of the &nbsp; entity you can follow the solution suggested by @Michael Krelin and define that entity in an internal DTD. After that, you can use it directly within your XSLT code.

Do note that in both cases the XSLT processor will output the literal non-breaking space character and not the &nbsp; entity reference or the &#160; character reference. Creating such references manually with XSLT 1.0 requires the usage of disable-output-escaping feature, which has its own problems as stated above.