XML Parsing: Element Tree (etree) vs. minidom

vy32 picture vy32 · Nov 5, 2011 · Viewed 18.3k times · Source

I've been using minidom to parse XML for years. Now I've suddenly learned about Element Tree. My question which is better for parsing? That is:

  • Which is faster?
  • Which uses less memory?
  • Do either have any O(n^2) dependencies I should worry about?
  • Is one being depreciated in favor of another?

Why do we have two interfaces?

Thanks.

Answer

Raymond Hettinger picture Raymond Hettinger · Nov 5, 2011

DOM and Sax interfaces for XML parsing are the classic ways to work with XML. Python had to provide those interfaces because they are well-known and standard.

The ElementTree package was intended to provide a more Pythonic interface. It is all about making things easier for the programmer.

Depending on your build, each of those has an underlying C implementation that makes them run fast.

None of the above tools is being deprecated. They each have their merits (Sax doesn't need to read the whole input into memory, for example).

There is also third-party module called lxml which is also a popular choice (full featured and fast).