Can ElementTree be told to preserve the order of attributes?

Question 1

Can ElementTree be told to preserve the order of attributes?

python xml elementtree

dmckee --- ex-moderator kitten · Apr 30, 2010 · Viewed 27k times · Source

Answer

Answer

With help from @bobince's answer and these two (setting attribute order, overriding module methods)

I managed to get this monkey patched it's dirty and I'd suggest using another module that better handles this scenario but when that isn't a possibility:

# =======================================================================
# Monkey patch ElementTree
import xml.etree.ElementTree as ET

def _serialize_xml(write, elem, encoding, qnames, namespaces):
    tag = elem.tag
    text = elem.text
    if tag is ET.Comment:
        write("<!--%s-->" % ET._encode(text, encoding))
    elif tag is ET.ProcessingInstruction:
        write("<?%s?>" % ET._encode(text, encoding))
    else:
        tag = qnames[tag]
        if tag is None:
            if text:
                write(ET._escape_cdata(text, encoding))
            for e in elem:
                _serialize_xml(write, e, encoding, qnames, None)
        else:
            write("<" + tag)
            items = elem.items()
            if items or namespaces:
                if namespaces:
                    for v, k in sorted(namespaces.items(),
                                       key=lambda x: x[1]):  # sort on prefix
                        if k:
                            k = ":" + k
                        write(" xmlns%s=\"%s\"" % (
                            k.encode(encoding),
                            ET._escape_attrib(v, encoding)
                            ))
                #for k, v in sorted(items):  # lexical order
                for k, v in items: # Monkey patch
                    if isinstance(k, ET.QName):
                        k = k.text
                    if isinstance(v, ET.QName):
                        v = qnames[v.text]
                    else:
                        v = ET._escape_attrib(v, encoding)
                    write(" %s=\"%s\"" % (qnames[k], v))
            if text or len(elem):
                write(">")
                if text:
                    write(ET._escape_cdata(text, encoding))
                for e in elem:
                    _serialize_xml(write, e, encoding, qnames, None)
                write("</" + tag + ">")
            else:
                write(" />")
    if elem.tail:
        write(ET._escape_cdata(elem.tail, encoding))

ET._serialize_xml = _serialize_xml

from collections import OrderedDict

class OrderedXMLTreeBuilder(ET.XMLTreeBuilder):
    def _start_list(self, tag, attrib_in):
        fixname = self._fixname
        tag = fixname(tag)
        attrib = OrderedDict()
        if attrib_in:
            for i in range(0, len(attrib_in), 2):
                attrib[fixname(attrib_in[i])] = self._fixtext(attrib_in[i+1])
        return self._target.start(tag, attrib)

# =======================================================================

Then in your code:

tree = ET.parse(pathToFile, OrderedXMLTreeBuilder())

Question 2

I've written a fairly simple filter in python using ElementTree to munge the contexts of some xml files. And it works, more or less.

But it reorders the attributes of various tags, and I'd like it to not do that.

Does anyone know a switch I can throw to make it keep them in specified order?

Context for this

I'm working with and on a particle physics tool that has a complex, but oddly limited configuration system based on xml files. Among the many things setup that way are the paths to various static data files. These paths are hardcoded into the existing xml and there are no facilities for setting or varying them based on environment variables, and in our local installation they are necessarily in a different place.

This isn't a disaster because the combined source- and build-control tool we're using allows us to shadow certain files with local copies. But even thought the data fields are static the xml isn't, so I've written a script for fixing the paths, but with the attribute rearrangement diffs between the local and master versions are harder to read than necessary.

This is my first time taking ElementTree for a spin (and only my fifth or sixth python project) so maybe I'm just doing it wrong.

Abstracted for simplicity the code looks like this:

tree = elementtree.ElementTree.parse(inputfile)
i = tree.getiterator()
for e in i:
    e.text = filter(e.text)
tree.write(outputfile)

Reasonable or dumb?

Can ElementTree be told to preserve the order of attributes?

Context for this

Answer

Related questions