Currently using Python 2.4.3, and not allowed to upgrade
I want to change the values of a given attribute in one or more tags, together with XML-comments in the updated file.
I have managed to create a Python script that takes a XML-file as argument, and for each tag specified changes an attribute, as shown below
def update(file, state):
global Etree
try:
from elementtree import ElementTree
print '*** using ElementTree'
except ImportError, e:
print '***'
print '*** Error: Must install either ElementTree or lxml.'
print '***'
raise ImportError, 'must install either ElementTree or lxml'
#end try
doc = Etree.parse(file)
root = doc.getroot()
for element in root.findall('.//StateManageable'):
element.attrib['initialState'] = state
#end for
doc.write(file)
#end def
This is all fine, the attributes "initialState" are updated, except for the fact that my original XML contains a lot of XML comments as well, but they are long gone, which is bad.
Suspect that parse only retrieves the XML-structure, but I thought XML-comments where a part of the structure. I also realize that the "human-readable" formatting of my original document is long gone, but that I have realized is expected behavior, need to format afterwards using xmllint --format
or XSL.
I know this is old now, but I stumbled across this answer above about how to retain comments. Frederik's published instructions about how to put comments into the tree still works with current versions of ElementTree, but does more than it needs to for my use, at least. It wraps the XML in a element, which is undesirable for me. I also don't need processing instructions preserved, but only comments. So, I trimmed down the class he provided on the site to this:
import xml.etree.ElementTree as ET
class PCParser(ET.XMLTreeBuilder):
def __init__(self):
ET.XMLTreeBuilder.__init__(self)
# assumes ElementTree 1.2.X
self._parser.CommentHandler = self.handle_comment
def handle_comment(self, data):
self._target.start(ET.Comment, {})
self._target.data(data)
self._target.end(ET.Comment)
To use this, create an instance of this object as a 'parser' and then pass as a parameter to ElementTree.parse() like this:
parser = PCParser()
self.tree = ET.parse(self.templateOut, parser=parser)
I take no credit whatsoever for the code, or for the undocumented use of ElementTree, but it works for me in preserving only comments without affecting the original document structure. And note that any future change to ElementTree (seems unlikely at this point after all these years, though) will break this.