Regex to ignore accents? PHP

eSinxoll picture eSinxoll · May 7, 2012 · Viewed 7.1k times · Source

Is there anyway to make a Regex that ignores accents?

For example:

preg_replace("/$word/i", "<b>$word</b>", $str);

The "i" in the regex is to ignore case sensitive, but is there anyway to match, for example
java with Jávã?

I did try to make a copy of the $str, change the content to a no accent string and find the index of all the occurrences. But the index of the 2 strings seems to be different, even though it's just with no accents.

(I did a research, but all I could found is how to remove accents from a string)

Answer

user267885 picture user267885 · May 7, 2012

I don't think, there is such a way. That would be locale-dependent and you probably want a "/u" switch first to enable UTF-8 in pattern strings.

I would probably do something like this.

function prepare($pattern)
{
   $replacements = Array("a" => "[áàäâ]",
                         "e" => "[éèëê]" ...);
   return str_replace(array_keys($replacements), $replacements, $pattern);  
}

pcre_replace("/(" . prepare($word) . ")/ui", "<b>\\1</b>", $str);

In your case, index was different, because unless you used mb_string you were probably dealing with UTF-8 which uses more than one byte per character.