When used correctly, is htmlspecialchars sufficient for protection against all XSS?

Alf Eaton picture Alf Eaton · Oct 25, 2013 · Viewed 19.7k times · Source

If the following statements are true,

  • All documents are served with the HTTP header Content-Type: text/html; charset=UTF-8.
  • All HTML attributes are enclosed in either single or double quotes.
  • There are no <script> tags in the document.

are there any cases where htmlspecialchars($input, ENT_QUOTES, 'UTF-8') (converting &, ", ', <, > to the corresponding named HTML entities) is not enough to protect against cross-site scripting when generating HTML on a web server?

Answer

bobince picture bobince · Oct 25, 2013

htmlspecialchars() is enough to prevent document-creation-time HTML injection with the limitations you state (ie no injection into tag content/unquoted attribute).

However there are other kinds of injection that can lead to XSS and:

There are no <script> tags in the document.

this condition doesn't cover all cases of JS injection. You might for example have an event handler attribute (requires JS-escaping inside HTML-escaping):

<div onmouseover="alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!

or, even worse, a javascript: link (requires JS-escaping inside URL-escaping inside HTML-escaping):

<a href="javascript:alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!

It is usually best to avoid these constructs anyway, but especially when templating. Writing <?php echo htmlspecialchars(urlencode(json_encode($something))) ?> is quite tedious.

And... injection issues can happen on the client-side as well (DOM XSS); htmlspecialchars() won't protect you against a piece of JavaScript writing to innerHTML (commonly .html() in poor jQuery scripts) without explicit escaping.

And... XSS has a wider range of causes than just injections. Other common causes are:

  • allowing the user to create links, without checking for known-good URL schemes (javascript: is the most well-known harmful scheme but there are more)

  • deliberately allowing the user to create markup, either directly or through light-markup schemes (like bbcode which is invariably exploitable)

  • allowing the user to upload files (which can through various means be reinterpreted as HTML or XML)