XML validation against given DTD in PHP

Allanrbo picture Allanrbo · Aug 13, 2009 · Viewed 7.7k times · Source

In PHP, I am trying to validate an XML document using a DTD specified by my application - not by the externally fetched XML document. The validate method in the DOMDocument class seems to only validate using the DTD specified by the XML document itself, so this will not work.

Can this be done, and how, or do I have to translate my DTD to an XML schema so I can use the schemaValidate method?

(this seems to have been asked in Validate XML using a custom DTD in PHP but without correct answer, since the solution only relies on DTD speicified by the target XML)

Answer

mercator picture mercator · Aug 14, 2009

Note: XML validation could be subject to the Billion Laughs attack, and similar DoS vectors.

This essentially does what rojoca mentioned in his comment:

<?php

$xml = <<<END
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE foo SYSTEM "foo.dtd">
<foo>
    <bar>baz</bar>
</foo>
END;

$root = 'foo';

$old = new DOMDocument;
$old->loadXML($xml);

$creator = new DOMImplementation;
$doctype = $creator->createDocumentType($root, null, 'bar.dtd');
$new = $creator->createDocument(null, null, $doctype);
$new->encoding = "utf-8";

$oldNode = $old->getElementsByTagName($root)->item(0);
$newNode = $new->importNode($oldNode, true);
$new->appendChild($newNode);

$new->validate();

?>

This will validate the document against the bar.dtd.

You can't just call $new->loadXML(), because that would just set the DTD to the original, and the doctype property of a DOMDocument object is read-only, so you have to copy the root node (with everything in it) to a new DOM document.

I only just had a go with this myself, so I'm not entirely sure if this covers everything, but it definitely works for the XML in my example.

Of course, the quick-and-dirty solution would be to first get the XML as a string, search and replace the original DTD by your own DTD and then load it.