From Wikipedia:
A slug is the part of a URL which identifies a page using human-readable keywords.
To make the URL easier for users to type, special characters are often removed or replaced as well. For instance, accented characters are usually replaced by letters from the English alphabet; punctuation marks are generally removed; and spaces (which have to be encoded as %20 or +) are replaced by dashes (-) or underscores (_), which are more aesthetically pleasing.
I developed a photo-sharing website on which users can upload, share and view photos.
All pages are generated automatically without my grip on the title. Because the title of a photo or the name of a user may contain accented characters or spaces, I needed a function to automatically create slugs and keep readable URLs.
I created the following function which replaces accented characters (âèêëçî), removes punctuation and bad characters (#@&~^!) and transforms spaces in dashes.
php:
function sluggable($str) {
$before = array(
'àáâãäåòóôõöøèéêëðçìíîïùúûüñšž',
'/[^a-z0-9\s]/',
array('/\s/', '/--+/', '/---+/')
);
$after = array(
'aaaaaaooooooeeeeeciiiiuuuunsz',
'',
'-'
);
$str = strtolower($str);
$str = strtr($str, $before[0], $after[0]);
$str = preg_replace($before[1], $after[1], $str);
$str = trim($str);
$str = preg_replace($before[2], $after[2], $str);
return $str;
}
I like the php-slugs code at google code solution. But if you want a simpler one that works with UTF-8:
function format_uri( $string, $separator = '-' )
{
$accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
$special_cases = array( '&' => 'and', "'" => '');
$string = mb_strtolower( trim( $string ), 'UTF-8' );
$string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
$string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
$string = preg_replace("/[^a-z0-9]/u", "$separator", $string);
$string = preg_replace("/[$separator]+/u", "$separator", $string);
return $string;
}
So
echo format_uri("#@&~^!âèêëçî");
outputs
-and-aeeeci
Please, comment if you find some errors