XSS filter to remove all scripts

Cool Techie picture Cool Techie · Jul 9, 2015 · Viewed 14.8k times · Source

I am implementing an XSS filter for my web application and also using the ESAPI encoder to sanitise the input.

The patterns I am using are as given below,

 // Script fragments
Pattern.compile("<script>(.*?)</script>", Pattern.CASE_INSENSITIVE),
// src='...'
Pattern.compile("src[\r\n]*=[\r\n]*\\\'(.*?)\\\'", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
Pattern.compile("src[\r\n]*=[\r\n]*\\\"(.*?)\\\"", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// lonely script tags
Pattern.compile("</script>", Pattern.CASE_INSENSITIVE),
Pattern.compile("<script(.*?)>", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// eval(...)
Pattern.compile("eval\\((.*?)\\)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// expression(...)
Pattern.compile("expression\\((.*?)\\)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// javascript:...
Pattern.compile("javascript:", Pattern.CASE_INSENSITIVE),
// vbscript:...
Pattern.compile("vbscript:", Pattern.CASE_INSENSITIVE),
// onload(...)=...
Pattern.compile("onload(.*?)=", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL)

But, still a few script are not getting filtered specially the one which are appended to a parameter like

url?sourceId=abx;alert('hello');

How do I handle these?

Answer

avgvstvs picture avgvstvs · Jul 9, 2015

This isn't the right approach. It's mathematically impossible to write a regex capable of correctly punting XSS. (Regex is "regular" but HTML and Javascript are both context-free grammars.)

You can however guarantee that when you switch contexts, (hand off a piece of data that is going to be interpreted) that the data is correctly escaped for that context switch. So, when sending data to a browser, escape it for HTML if its being handled as HTML or as Javascript if its being handled by javascript.

If you DO need to allow HTML/javascript into your application, then you'll want a web-application firewall or a framework like HDIV.