php: using DomDocument whenever I try to write UTF-8 it writes the hexadecimal notation of it

ufk picture ufk · Aug 26, 2010 · Viewed 16.7k times · Source

When I try to write UTF-8 Strings into an XML file using DomDocument it actually writes the hexadecimal notation of the string instead of the string itself.

for example:

ירושלים

instead of: ירושלים

any ideas how to resolve the issue?

Answer

Gordon picture Gordon · Aug 26, 2010

Ok, here you go:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->appendChild($dom->createElement('root'));
$dom->documentElement->appendChild(new DOMText('ירושלים'));
echo $dom->saveXml();

will work fine, because in this case, the document you constructed will retain the encoding specified as the second argument:

<?xml version="1.0" encoding="utf-8"?>
<root>ירושלים</root>

However, once you load XML into a Document that does not specify an encoding, you will lose anything you declared in the constructor, which means:

$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadXml('<root/>'); // missing prolog
$dom->documentElement->appendChild(new DOMText('ירושלים'));
echo $dom->saveXml();

will not have an encoding of utf-8:

<?xml version="1.0"?>
<root>&#x5D9;&#x5E8;&#x5D5;&#x5E9;&#x5DC;&#x5D9;&#x5DD;</root>

So if you loadXML something, make sure it is

$dom = new DOMDocument();
$dom->loadXml('<?xml version="1.0" encoding="utf-8"?><root/>');
$dom->documentElement->appendChild(new DOMText('ירושלים'));
echo $dom->saveXml();

and it will work as expected.

As an alternative, you can also specify the encoding after loading the document.