Can I use unencoded ampersands (&) in html?

Peter picture Peter · Jun 27, 2012 · Viewed 9.6k times · Source

I'm building a website where I have to work with less then perfect masterdata (I guess I'm not the only one :-))

In my case I have to render an xml filte to html (using xsl). Sometimes the masterdata is using html-enitites allready (eg ;é in french words) so there I have to use 'disable-output-escaping='yes') there in order to avoid double encoding.

The easiest solution is disable output escaping all together, so I never run the risk of a double encoding.

The only characters that misses encoding for this masterdata are the ampersands. But when I parse them 'raw' (so rather & than & all browsers seem to be ok with it.

So the question : what are the consequenses of using not encoded ampersands in html?

Answer

Razor picture Razor · Jun 27, 2012

It depends

The best research I have seen on this topic can be found here

In HTML5 you should escape all of the ampersands that do not fall in the categories below:

An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the named character references section.