I've written a fairly simple filter in python using ElementTree to munge the contexts of some xml files. And it works, more or less.
But it reorders the attributes of various tags, and I'd like it to not do that.
Does anyone know a switch I can throw to make it keep them in specified order?
I'm working with and on a particle physics tool that has a complex, but oddly limited configuration system based on xml files. Among the many things setup that way are the paths to various static data files. These paths are hardcoded into the existing xml and there are no facilities for setting or varying them based on environment variables, and in our local installation they are necessarily in a different place.
This isn't a disaster because the combined source- and build-control tool we're using allows us to shadow certain files with local copies. But even thought the data fields are static the xml isn't, so I've written a script for fixing the paths, but with the attribute rearrangement diffs between the local and master versions are harder to read than necessary.
This is my first time taking ElementTree for a spin (and only my fifth or sixth python project) so maybe I'm just doing it wrong.
Abstracted for simplicity the code looks like this:
tree = elementtree.ElementTree.parse(inputfile)
i = tree.getiterator()
for e in i:
e.text = filter(e.text)
tree.write(outputfile)
Reasonable or dumb?
Related links:
With help from @bobince's answer and these two (setting attribute order, overriding module methods)
I managed to get this monkey patched it's dirty and I'd suggest using another module that better handles this scenario but when that isn't a possibility:
# =======================================================================
# Monkey patch ElementTree
import xml.etree.ElementTree as ET
def _serialize_xml(write, elem, encoding, qnames, namespaces):
tag = elem.tag
text = elem.text
if tag is ET.Comment:
write("<!--%s-->" % ET._encode(text, encoding))
elif tag is ET.ProcessingInstruction:
write("<?%s?>" % ET._encode(text, encoding))
else:
tag = qnames[tag]
if tag is None:
if text:
write(ET._escape_cdata(text, encoding))
for e in elem:
_serialize_xml(write, e, encoding, qnames, None)
else:
write("<" + tag)
items = elem.items()
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k.encode(encoding),
ET._escape_attrib(v, encoding)
))
#for k, v in sorted(items): # lexical order
for k, v in items: # Monkey patch
if isinstance(k, ET.QName):
k = k.text
if isinstance(v, ET.QName):
v = qnames[v.text]
else:
v = ET._escape_attrib(v, encoding)
write(" %s=\"%s\"" % (qnames[k], v))
if text or len(elem):
write(">")
if text:
write(ET._escape_cdata(text, encoding))
for e in elem:
_serialize_xml(write, e, encoding, qnames, None)
write("</" + tag + ">")
else:
write(" />")
if elem.tail:
write(ET._escape_cdata(elem.tail, encoding))
ET._serialize_xml = _serialize_xml
from collections import OrderedDict
class OrderedXMLTreeBuilder(ET.XMLTreeBuilder):
def _start_list(self, tag, attrib_in):
fixname = self._fixname
tag = fixname(tag)
attrib = OrderedDict()
if attrib_in:
for i in range(0, len(attrib_in), 2):
attrib[fixname(attrib_in[i])] = self._fixtext(attrib_in[i+1])
return self._target.start(tag, attrib)
# =======================================================================
Then in your code:
tree = ET.parse(pathToFile, OrderedXMLTreeBuilder())