Regex to remove non-letter characters but keep accented letters

devjs11 picture devjs11 · Dec 1, 2011 · Viewed 14k times · Source

I have strings in Spanish and other languages that may contain generic special characters like (),*, etc. That I need to remove. But the problem is that it also may contain special language characters like ñ, á, ó, í etc and they need to remain. So I am trying to do it with regexp the following way:

var desired = stringToReplace.replace(/[^\w\s]/gi, '');

Unfortunately it is removing all special characters including the language related. Not sure how to avoid that. Maybe someone could suggest?

Answer

Tim Down picture Tim Down · Oct 16, 2012

I would suggest using Steven Levithan's excellent XRegExp library and its Unicode plug-in.

Here's an example that strips non-Latin word characters from a string: http://jsfiddle.net/b3awZ/1/

var regex = XRegExp("[^\\s\\p{Latin}]+", "g");
var str = "¿Me puedes decir la contraseña de la Wi-Fi?"
var replaced = XRegExp.replace(str, regex, "");

See also this answer by Steven Levithan himself:

Regular expression Spanish and Arabic words