How do I tell DOMDocument->load() what encoding I want it to use?

Lokkju picture Lokkju · Aug 13, 2009 · Viewed 20.5k times · Source

I search for and process XML files from elsewhere, and need to transform them with some XSLTs. No problem. Using PHP5 and the DOM library, everything's a snap. Worked fine, up till now. Today, funky characters were in the XML file -- "smart" quotes from Word, it looks like. Anyways, DOMDocument->load complained about them, saying that they weren't UTF-8, and to specify the encoding.

Lo and behold, the encoding is not specified in these XML files. If I add in 'encoding="iso-8859-1"' to the header, it works fine. The rub is I have no control over these XML files.

Reading the file into a string, modifying its header and writing it back out to another location seems to be my only option, but I'd prefer to do it without having to use temporary copies of the XML files at all. Is there any way to simply tell the parser to parse them as if they were iso-8859-1?

Answer

nickf picture nickf · Aug 13, 2009

Does this work for you?

$doc = new DOMDocument('1.0', 'iso-8859-1');
$doc->load($xmlPath);

Edit: Since it appears that this doesn't work, what you could do instead is similar to your existing method but without the temp file. Read the XML file from your source just using standard IO operations (file_get_contents() or something), then perform whatever changes to the encoding you need (iconv() or utf8_decode()) and then use loadXML()

$myXMLString = file_get_contents($xmlPath);
$myXMLString = utf8_decode($myXMLString);
$doc = new DOMDocument('1.0', 'iso-8859-1');
$doc->loadXML($myXMLString);