Multi-byte safe wordwrap() function for UTF-8

philfreo picture philfreo · Sep 29, 2010 · Viewed 13.5k times · Source

PHP's wordwrap() function doesn't work correctly for multi-byte strings like UTF-8.

There are a few examples of mb safe functions in the comments, but with some different test data they all seem to have some problems.

The function should take the exact same parameters as wordwrap().

Specifically be sure it works to:

  • cut mid-word if $cut = true, don't cut mid-word otherwise
  • not insert extra spaces in words if $break = ' '
  • also work for $break = "\n"
  • work for ASCII, and all valid UTF-8

Answer

Fosfor picture Fosfor · Feb 14, 2011

I haven't found any working code for me. Here is what I've written. For me it is working, thought it is probably not the fastest.

function mb_wordwrap($str, $width = 75, $break = "\n", $cut = false) {
    $lines = explode($break, $str);
    foreach ($lines as &$line) {
        $line = rtrim($line);
        if (mb_strlen($line) <= $width)
            continue;
        $words = explode(' ', $line);
        $line = '';
        $actual = '';
        foreach ($words as $word) {
            if (mb_strlen($actual.$word) <= $width)
                $actual .= $word.' ';
            else {
                if ($actual != '')
                    $line .= rtrim($actual).$break;
                $actual = $word;
                if ($cut) {
                    while (mb_strlen($actual) > $width) {
                        $line .= mb_substr($actual, 0, $width).$break;
                        $actual = mb_substr($actual, $width);
                    }
                }
                $actual .= ' ';
            }
        }
        $line .= trim($actual);
    }
    return implode($break, $lines);
}