This is my code:
$oDom = new DOMDocument();
$oDom->loadHTML("èàéìòù");
echo $oDom->saveHTML();
This is the output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>èà éìòù</p></body></html>
I want this output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>èàéìòù</p></body></html>
I've tried with ...
$oDom = new DomDocument('4.0', 'UTF-8');
or with 1.0 and other stuffs but nothing.
Another thing ...
There is a way to obtain the same untouched HTML?
For example with this html in input <p>hello!</p>
obtain the same output <p>hello!</p>
using DOMDocument only for parsing the DOM and to do some substitutions inside the tags.
Solution:
$oDom = new DOMDocument();
$oDom->encoding = 'utf-8';
$oDom->loadHTML( utf8_decode( $sString ) ); // important!
$sHtml = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$sHtml .= $oDom->saveHTML( $oDom->documentElement ); // important!
The saveHTML()
method works differently specifying a node.
You can use the main node ($oDom->documentElement
) adding the desired !DOCTYPE
manually.
Another important thing is utf8_decode()
.
All the attributes and the other methods of the DOMDocument
class, in my case, don't produce the desired result.