JavaScript + Unicode regexes

Amit picture Amit · Nov 11, 2008 · Viewed 135.8k times · Source

How can I use Unicode-aware regular expressions in JavaScript?

For example, there should be something akin to \w that can match any code-point in Letters or Marks category (not just the ASCII ones), and hopefully have filters like [[P*]] for punctuation, etc.

Answer

Tomalak picture Tomalak · Nov 11, 2008

Situation for ES 6

The upcoming ECMAScript language specification, edition 6, includes Unicode-aware regular expressions. Support must be enabled with the u modifier on the regex. See Unicode-aware regular expressions in ES6.

Until ES 6 is finished and widely adopted among browser vendors you're still on your own, though. Update: There is now a transpiler named regexpu that translates ES6 Unicode regular expressions into equivalent ES5. It can be used as part of your build process. Try it out online.

Situation for ES 5 and below

Even though JavaScript operates on Unicode strings, it does not implement Unicode-aware character classes and has no concept of POSIX character classes or Unicode blocks/sub-ranges.