A plain JavaScript way to decode HTML entities, works on both browsers and Node

Henry He picture Henry He · May 26, 2017 · Viewed 18.5k times · Source

How to decode HTML entities like   ' to its original character?

In browsers we can create a DOM to do the trick (see here) or we can use some libraries like he

In NodeJS we can use some third party lib like html-entities

What if we want to use plain JavaScript to do the job?

There are many similar questions and useful answers in stackoverflow but I can't find a way works both on browsers and Node.js. So I'd like to share my opinion.

I have posted my opinion as an answer below. I hope it can be a helping hand for someone. :)

Answer

Henry He picture Henry He · May 26, 2017

There are many similar questions and useful answers in stackoverflow but I can't find a way works both on browsers and Node.js. So I'd like to share my opinion.

For html codes like   < > ' and even Chinese characters.

I suggest to use this function. (Inspired by some other answers)

function decodeEntities(encodedString) {
    var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
    var translate = {
        "nbsp":" ",
        "amp" : "&",
        "quot": "\"",
        "lt"  : "<",
        "gt"  : ">"
    };
    return encodedString.replace(translate_re, function(match, entity) {
        return translate[entity];
    }).replace(/&#(\d+);/gi, function(match, numStr) {
        var num = parseInt(numStr, 10);
        return String.fromCharCode(num);
    });
}

This implement also works in Node.js environment.

decodeEntities("&#21704;&#21704;&nbsp;&#39;&#36825;&#20010;&#39;&amp;&quot;&#37027;&#20010;&quot;&#22909;&#29609;&lt;&gt;") //哈哈 '这个'&"那个"好玩<>

As a new user, I only have 1 reputation :(

I can't make comments or answers to existing posts so that's the only way I can do for now.

Edit 1

I think this answer works even better than mine. Although no one gave him up vote.