How to remove diacritics from text?

user188962 picture user188962 · Nov 20, 2009 · Viewed 27.4k times · Source

I am making a swedish website, and swedish letters are å, ä, and ö.

I need to make a string entered by a user to become url-safe with PHP.

Basically, need to convert all characters to underscore, all EXCEPT these:

 A-Z, a-z, 1-9

and all swedish should be converted like this:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

The rest should become underscores as I said.

Im not good at regular expressions so I would appreciate the help guys!

Thanks

NOTE: NOT URLENCODE...I need to store it in a database... etc etc, urlencode wont work for me.

Answer

user1518659 picture user1518659 · Oct 12, 2012

This should be useful which handles almost all the cases.

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}