java string.getBytes("UTF-8") javascript equivalent

user429620 picture user429620 · Apr 4, 2014 · Viewed 36.8k times · Source

I have this string in java:

"test.message"

byte[] bytes = plaintext.getBytes("UTF-8");
//result: [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]

If I do the same thing in javascript:

    stringToByteArray: function (str) {         
        str = unescape(encodeURIComponent(str));

        var bytes = new Array(str.length);
        for (var i = 0; i < str.length; ++i)
            bytes[i] = str.charCodeAt(i);

        return bytes;
    },

I get:

 [7,163,140,72,178,72,244,241,149,43,67,124]

I was under the impression that the unescape(encodeURIComponent()) would correctly translate the string to UTF-8. Is this not the case?

Reference:

http://ecmanaut.blogspot.be/2006/07/encoding-decoding-utf8-in-javascript.html

Answer

Kevin Hakanson picture Kevin Hakanson · Sep 1, 2014

You can use TextEncoder which is part of the Encoding Living Standard. According to the Encoding API entry from the Chromium Dashboard, it shipped in Firefox and will ship in Chrome 38. There is also a text-encoding polyfill available.

The JavaScript code sample below returns a Uint8Array filled with the values you expect.

var s = "test.message";
var encoder = new TextEncoder();
encoder.encode(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]