How to unescape HTML character entities in Java?

yinyueyouge picture yinyueyouge · Jun 15, 2009 · Viewed 252.6k times · Source

Basically I would like to decode a given Html document, and replace all special chars, such as " " -> " ", ">" -> ">".

In .NET we can make use of HttpUtility.HtmlDecode.

What's the equivalent function in Java?

Answer

Kevin Hakanson picture Kevin Hakanson · Jun 15, 2009

I have used the Apache Commons StringEscapeUtils.unescapeHtml4() for this:

Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.