How to convert UTF8 characters to numeric character entities in PHP

darkAsPitch picture darkAsPitch · Sep 30, 2011 · Viewed 7.4k times · Source

Is a translation of the below code at all possible using PHP?

The code below is written in JavaScript. It returns html with numeric character references where needed. Ex. smslån -> smslån

I have been unsuccessful at creating a translation. This script looked like it may work, but returns å for å instead of å as the javascript below does.

function toEntity() {
  var aa = document.form.utf.value;
  var bb = '';
  for(i=0; i<aa.length; i++)
  {
    if(aa.charCodeAt(i)>127)
    {
      bb += '&#' + aa.charCodeAt(i) + ';';
    }
    else
    {
      bb += aa.charAt(i);
    }
  }
  document.form.entity.value = bb;
}

PHP's ord function sounds like it does the same thing as charCodeAt, but it does not. I get 195 for å using ord and 229 using charCodeAt. That, or I am having some incredibly difficult encoding problems.

Answer

phihag picture phihag · Sep 30, 2011

Use mb_encode_numericentity:

$convmap = array(0x80, 0xffff, 0, 0xffff);
echo mb_encode_numericentity($utf8Str, $convmap, 'UTF-8');