How to saveHTML of DOMDocument without HTML wrapper?

Question 1

How to saveHTML of DOMDocument without HTML wrapper?

php serialization domdocument

Scott B · Feb 2, 2011 · Viewed 56.9k times · Source

Answer

Answer

All of these answers are now wrong, because as of PHP 5.4 and Libxml 2.6 loadHTML now has a $option parameter which instructs Libxml about how it should parse the content.

Therefore, if we load the HTML with these options

$html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

when doing saveHTML() there will be no doctype, no <html>, and no <body>.

LIBXML_HTML_NOIMPLIED turns off the automatic adding of implied html/body elements LIBXML_HTML_NODEFDTD prevents a default doctype being added when one is not found.

Full documentation about Libxml parameters is here

(Note that loadHTML docs say that Libxml 2.6 is needed, but LIBXML_HTML_NODEFDTD is only available in Libxml 2.7.8 and LIBXML_HTML_NOIMPLIED is available in Libxml 2.7.7)

Question 2

I'm the function below, I'm struggling to output the DOMDocument without it appending the XML, HTML, body and p tag wrappers before the output of the content. The suggested fix:

$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));

Only works when the content has no block level elements inside it. However, when it does, as in the example below with the h1 element, the resulting output from saveXML is truncated to...

<p>If you like</p>

I've been pointed to this post as a possible workaround, but I can't understand how to implement it into this solution (see commented out attempts below).

Any suggestions?

function rseo_decorate_keyword($postarray) {
    global $post;
    $keyword = "Jasmine Tea"
    $content = "If you like <h1>jasmine tea</h1> you will really like it with Jasmine Tea flavors. This is the last ocurrence of the phrase jasmine tea within the content. If there are other instances of the keyword jasmine tea within the text what happens to jasmine tea."
    $d = new DOMDocument();
    @$d->loadHTML($content);
    $x = new DOMXpath($d);
    $count = $x->evaluate("count(//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and (ancestor::b or ancestor::strong)])");
    if ($count > 0) return $postarray;
    $nodes = $x->query("//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and not(ancestor::h1) and not(ancestor::h2) and not(ancestor::h3) and not(ancestor::h4) and not(ancestor::h5) and not(ancestor::h6) and not(ancestor::b) and not(ancestor::strong)]");
    if ($nodes && $nodes->length) {
        $node = $nodes->item(0);
        // Split just before the keyword
        $keynode = $node->splitText(strpos($node->textContent, $keyword));
        // Split after the keyword
        $node->nextSibling->splitText(strlen($keyword));
        // Replace keyword with <b>keyword</b>
        $replacement = $d->createElement('strong', $keynode->textContent);
        $keynode->parentNode->replaceChild($replacement, $keynode);
    }
$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->item(1));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->childNodes);
return $postarray;
}

How to saveHTML of DOMDocument without HTML wrapper?

Answer

Related questions