Convert HTML Character Back to Text Using Java Standard Library

Cheok Yan Cheng picture Cheok Yan Cheng · Mar 1, 2009 · Viewed 117.7k times · Source

I would like to convert some HTML characters back to text using Java Standard Library. I was wondering whether any library would achieve my purpose?

/**
 * @param args the command line arguments
 */
public static void main(String[] args) {
    // TODO code application logic here

    // "Happy & Sad" in HTML form.
    String s = "Happy & Sad";
    System.out.println(s);

    try {
        // Change to "Happy & Sad". DOESN'T WORK!
        s = java.net.URLDecoder.decode(s, "UTF-8");
        System.out.println(s);
    } catch (UnsupportedEncodingException ex) {

    }
}

Answer

Bill.D picture Bill.D · Mar 1, 2009

I think the Apache Commons Lang library's StringEscapeUtils.unescapeHtml3() and unescapeHtml4() methods are what you are looking for. See https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html.