JavaScript Unicode normalization

Matty picture Matty · Oct 14, 2011 · Viewed 13.6k times · Source

I'm under the impression that JavaScript interpreter assumes that the source code it is interpreting has already been normalized. What, exactly does the normalizing? It can't be the text editor, otherwise the plaintext representation of the source would change. Is there some "preprocessor" that does the normalization?

Answer

bobince picture bobince · Oct 14, 2011

No, there is no Unicode Normalization feature used automatically on—or even available to—JavaScript as per ECMAScript 5. All characters remain unchanged as their original code points, potentially in a non-Normal Form.

eg try:

<script type="text/javascript">
    var a= 'café';          // caf\u00E9
    var b= 'café';          // cafe\u0301
    alert(a+' '+a.length);  // café 4
    alert(b+' '+b.length);  // café 5
    alert(a==b);            // false
</script>

Update: ECMAScript 6 will introduce Unicode normalization for JavaScript strings.