How to allow specific characters with OWASP HTML Sanitizer?

ams picture ams · Sep 24, 2012 · Viewed 9.5k times · Source

I am using the OWASP Html Sanitizer to prevent XSS attacks on my web app. For many fields that should be plain text the Sanitizer is doing more than I expect.

For example:

HtmlPolicyBuilder htmlPolicyBuilder = new HtmlPolicyBuilder();
stripAllTagsPolicy = htmlPolicyBuilder.toFactory();
stripAllTagsPolicy.sanitize('a+b'); // return a+b
stripAllTagsPolicy.sanitize('[email protected]'); // return foo@example.com

When I have fields such as email address that have a + in it such as [email protected] I end up with the wrong data in the the database. So two questions:

  1. Are characters such as + - @ dangerous on their own do they really need to be encoded?
  2. How do I configure the OWASP html sanitizer to allow specific characters such as + - @?

Question 2 is the more important one for me to get an answer to.

Answer

Mahendra picture Mahendra · Nov 17, 2014

You may want to use ESAPI API to filter specific characters. Although if you like to allow specific HTML element or attribute you can use following allowElements and allowAttributes.

// Define the policy.

Function<HtmlStreamEventReceiver, HtmlSanitizer.Policy> policy
     = new HtmlPolicyBuilder()
         .allowElements("a", "p")
         .allowAttributes("href").onElements("a")
         .toFactory();

 // Sanitize your output.
 HtmlSanitizer.sanitize(myHtml, policy.apply(myHtmlStreamRenderer));