How do I remove accentuated characters from a string? Especially in IE6, I had something like this:
accentsTidy = function(s){
var r=s.toLowerCase();
r = r.replace(new RegExp(/\s/g),"");
r = r.replace(new RegExp(/[àáâãäå]/g),"a");
r = r.replace(new RegExp(/æ/g),"ae");
r = r.replace(new RegExp(/ç/g),"c");
r = r.replace(new RegExp(/[èéêë]/g),"e");
r = r.replace(new RegExp(/[ìíîï]/g),"i");
r = r.replace(new RegExp(/ñ/g),"n");
r = r.replace(new RegExp(/[òóôõö]/g),"o");
r = r.replace(new RegExp(/œ/g),"oe");
r = r.replace(new RegExp(/[ùúûü]/g),"u");
r = r.replace(new RegExp(/[ýÿ]/g),"y");
r = r.replace(new RegExp(/\W/g),"");
return r;
};
but IE6 bugs me, seems it doesn't like my regular expression.
With ES2015/ES6 String.prototype.normalize(),
const str = "Crème Brulée"
str.normalize("NFD").replace(/[\u0300-\u036f]/g, "")
> "Creme Brulee"
Two things are happening here:
normalize()
ing to NFD
Unicode normal form decomposes combined graphemes into the combination of simple ones. The è
of Crème
ends up expressed as e
+ ̀
.See comment for performance testing.
Alternatively, if you just want sorting
Intl.Collator has sufficient support ~95% right now, a polyfill is also available here but I haven't tested it.
const c = new Intl.Collator();
["creme brulee", "crème brulée", "crame brulai", "crome brouillé",
"creme brulay", "creme brulfé", "creme bruléa"].sort(c.compare)
["crame brulai", "creme brulay", "creme bruléa", "creme brulee",
"crème brulée", "creme brulfé", "crome brouillé"]
["creme brulee", "crème brulée", "crame brulai", "crome brouillé"].sort((a,b) => a>b)
["crame brulai", "creme brulee", "crome brouillé", "crème brulée"]