Basically I would like to decode a given Html document, and replace all special chars, such as " "
-> " "
, ">"
-> ">"
.
In .NET we can make use of HttpUtility.HtmlDecode
.
What's the equivalent function in Java?
I have used the Apache Commons StringEscapeUtils.unescapeHtml4() for this:
Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.