What's the difference between PHP's DOM and SimpleXML extensions?

Stann picture Stann · Jan 26, 2011 · Viewed 28.1k times · Source

I'm failing to comprehend why do we need 2 XML parsers in PHP.

Can someone explain the difference between those two?

Answer

Gordon picture Gordon · Jan 26, 2011

In a nutshell:

SimpleXml

  • is for simple XML and/or simple UseCases
  • limited API to work with nodes (e.g. cannot program to an interface that much)
  • all nodes are of the same kind (element node is the same as attribute node)
  • nodes are magically accessible, e.g. $root->foo->bar['attribute']

DOM

  • is for any XML UseCase you might have
  • is an implementation of the W3C DOM API (found implemented in many languages)
  • differentiates between various Node Types (more control)
  • much more verbose due to explicit API (can code to an interface)
  • can parse broken HTML
  • allows you to use PHP functions in XPath queries

Both of these are based on libxml and can be influenced to some extend by the libxml functions


Personally, I dont like SimpleXml too much. That's because I dont like the implicit access to the nodes, e.g. $foo->bar[1]->baz['attribute']. It ties the actual XML structure to the programming interface. The one-node-type-for-everything is also somewhat unintuitive because the behavior of the SimpleXmlElement magically changes depending on it's contents.

For instance, when you have <foo bar="1"/> the object dump of /foo/@bar will be identical to that of /foo but doing an echo of them will print different results. Moreover, because both of them are SimpleXml elements, you can call the same methods on them, but they will only get applied when the SimpleXmlElement supports it, e.g. trying to do $el->addAttribute('foo', 'bar') on the first SimpleXmlElement will do nothing. Now of course it is correct that you cannot add an attribute to an Attribute Node, but the point is, an attribute node would not expose that method in the first place.

But that's just my 2c. Make up your own mind :)


On a sidenote, there is not two parsers, but a couple more in PHP. SimpleXml and DOM are just the two that parse a document into a tree structure. The others are either pull or event based parsers/readers/writers.

Also see my answer to