How to sanitize HTML code in Java to prevent XSS attacks?

WildWezyr picture WildWezyr · Aug 5, 2010 · Viewed 49.3k times · Source

I'm looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.

I get html code from rich text editor (e.g. TinyMCE) but it can be send malicious way around, ommiting TinyMCE validation ("Data submitted form off-site").

Is there anything as simple to use as InputFilter in PHP? Perfect solution I can imagine works like that (assume sanitizer is encapsulated in HtmlSanitizer class):

String unsanitized = "...<...>...";           // some potentially 
                                              // dangerous html here on input

HtmlSanitizer sat = new HtmlSanitizer();      // sanitizer util class created

String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...

Update - the simpler solution, the better! Small util class with as little external dependencies on other libraries/frameworks as possible - would be best for me.


How about that?

Answer

Saljack picture Saljack · Aug 4, 2015

You can try OWASP Java HTML Sanitizer. It is very simple to use.

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("a")
    .allowUrlProtocols("https")
    .allowAttributes("href").onElements("a")
    .requireRelNofollowOnLinks()
    .build();

String safeHTML = policy.sanitize(untrustedHTML);