Do I really need to encode '&' as '&'?

Haroldo picture Haroldo · Aug 16, 2010 · Viewed 358.9k times · Source

I'm using an '&' symbol with HTML5 and UTF-8 in my site's <title>. Google shows the ampersand fine on its SERPs, as do all the browsers in their titles.

http://validator.w3.org is giving me this:

& did not start a character reference. (& probably should have been escaped as &amp;.)

Do I really need to do &amp;?

I'm not fussed about my pages validating for the sake of validating, but I'm curious to hear people's opinions on this and if it's important and why.

Answer

Delan Azabani picture Delan Azabani · Aug 16, 2010

Yes. Just as the error said, in HTML, attributes are #PCDATA meaning they're parsed. This means you can use character entities in the attributes. Using & by itself is wrong and if not for lenient browsers and the fact that this is HTML not XHTML, would break the parsing. Just escape it as &amp; and everything would be fine.

HTML5 allows you to leave it unescaped, but only when the data that follows does not look like a valid character reference. However, it's better just to escape all instances of this symbol than worry about which ones should be and which ones don't need to be.

Keep this point in mind; if you're not escaping & to &amp;, it's bad enough for data that you create (where the code could very well be invalid), you might also not be escaping tag delimiters, which is a huge problem for user-submitted data, which could very well lead to HTML and script injection, cookie stealing and other exploits.

Please just escape your code. It will save you a lot of trouble in the future.