How to transform & nbsp; in XSLT

Question 1

How to transform & nbsp; in XSLT

html xslt cdata

zosim · Nov 24, 2011 · Viewed 11.9k times · Source

Answer

Answer

The other two answers are correct, but I decided to take a little broader view to this subject.

What everyone should know about CDATA sections

CDATA section is just an alternative serialization form to an escaped XML string. This means that parser produces the same result for <span><![CDATA[ a & b < 2 ]]></span> and <span> a & b < 2 </span>. XML applications work on the parsed data, so an XML application should produce the same output for both example input elements.

Briefly: escaped data and un-escaped data inside a CDATA section mean exactly the same.

In this case

<span><xsl:text disable-output-escaping="yes"><![CDATA[&nbsp;Some text]]></xsl:text></span>

is identical to

<span><xsl:text disable-output-escaping="yes">&amp;nbsp;Some text</xsl:text></span>

Note that the & character has been escaped to & in the latter serialization form.

What everyone should know about `disable-output-escaping`

disable-output-escaping is a feature that concerns the serialization only. In order to maintain the well-formedness of the serialized XML, XSLT processors escape & and < (and possibly other characters) by using entities. Their escaped forms are & and <. Escaped or not, the XML data is the same. XSLT elements <xsl:value-of> and <xsl:text> can have a disable-output-escaping attribute but it is generally advised to avoid using this feature. Reasons for this are:

XSLT processor may produce only a result tree, which is passed on to another process without serializing it between the processes. In such case disabling output escaping will fail because the XSLT processor is not able to control the serialization of the result tree.
An XSLT processor is not required to support disable-output-escaping attribute. In such case the processor must escape the output (or it may raise an error) so again, disabling output escaping will fail.
An XSLT processor must escape characters that cannot be represented as such in the encoding that is used for the document output. Using disable-output-escaping on such characters will result in error or escaped text so again, disabling output escaping will fail.
Disabling output escaping will easily lead to malformed or invalid XML so using it requires great attention or post processing of the output with non-XML tools.
disable-output-escaping is often misunderstood and misused and the same result could be achieved with more regular ways e.g. creating new elements as literals or with <xsl:element>.

In this case

<span><xsl:text disable-output-escaping="yes"><![CDATA[&nbsp;Some text]]></xsl:text></span>

should output

<span>&nbsp;Some text</span>

but the & character got escaped instead, so in this case the output escaping seems to fail.

What everyone should know about using entities

If an XML document contains an entity reference, the entity must be declared, if not, the document is not valid. XML has only 5 pre-defined entities. They are:

& for &
< for <
> for >
" for "
' for '

All other entity references must be defined either in an internal DTD of the document or in an external DTD that the document refers to (directly or indirectly). Therefore blindly adding entity references to an XML document might result in invalid documents. Documents with (X)HTML DOCTYPE can use several entities (like  ) because the XHTML DTD refers to a DTD that contains their definition. The entities are defined in these three DTDs: http://www.w3.org/TR/html4/HTMLlat1.ent , http://www.w3.org/TR/html4/HTMLsymbol.ent and http://www.w3.org/TR/html4/HTMLspecial.ent .

An entity reference does not always get replaced with its replacement text. This could happen for example if the parser has no net connection to retrieve the DTD. Also non-validating parsers do not need to include the replacement text. In such cases the data represented by the entity is "lost". If the entity gets replacement works, there will be no signs in the parsed data model that the XML serialization had any entity references at all. The data model will be the same if one uses entities or their replacement values. Briefly: entities are only an alternative way to represent the replacement text of the entity reference.

In this case the replacement text of   is   (which is same than   using hexadecimal notation). Instead of trying to output the   entity, it will be easier and more robust to just use the solution suggested by @phihag. If you like the readability of the   entity you can follow the solution suggested by @Michael Krelin and define that entity in an internal DTD. After that, you can use it directly within your XSLT code.

Do note that in both cases the XSLT processor will output the literal non-breaking space character and not the   entity reference or the   character reference. Creating such references manually with XSLT 1.0 requires the usage of disable-output-escaping feature, which has its own problems as stated above.

Question 2

I have a following xslt:

<span><xsl:text disable-output-escaping="yes"><![CDATA[&nbsp;Some text]]></xsl:text></span>

After transformation I get:

<span>&amp;nbsp;Some text</span>

which is rendered as: & nbsp;Some text

I want to render & nbsp; as space character. I have tried also change disable-output-escaping to no, but it didn't helped.

thanks for help.

How to transform & nbsp; in XSLT

Answer

What everyone should know about CDATA sections

What everyone should know about disable-output-escaping

What everyone should know about using entities

Related questions

What everyone should know about `disable-output-escaping`