Hi Stackoverflow Community,
I would appreciate some guidance in adjusting my XML file with Python and the elementTree library.
For some background, I am not a student and work in industry. I hope to save myself a great deal of manual effort by making these changes automated and typically I would have just done this in a language such as C++ that I am more familiar with. However, there is a push to use Python in my group so I am using this as both a functional and learning exercise.
Thus, solution guidance is helpful but when possible could you please correct my use of terms and understanding? I do not simply just want the code to work, but to know that my understanding of how it works is correct.
Goal: remove the sub-element "weight" from the XML file.
Using the xml code (let's just say it is called "example.xml"):
<XML_level_1 created="2014-08-19 16:55:02" userID="User@company">
<XML_level_2 manufacturer="company" number="store-25235">
<padUnits value="mm" />
<partDescription value="Part description explained here" />
<weight value="5.2" />
</XML_level_2>
</XML_level_1>
Thus far, I have the following code:
from xml.etree import ElementTree
current_xml_tree = ElementTree.parse(filename_path) # Path to example.xml
current_xml_root = current_xml_tree.getroot()
current_xml_level_2_node = current_xml_root.findall('XML_level_2')
# Extract "weight" value for later use
for weight_value_elem in current_xml_root.iter('weight'):
weight_value = weight_value_elem.get('value')
# Remove weight sub-element from XML
# -------------------------------------
# Get all nodes entitled 'weight' from element
weight_nodes = current_xml_root.findall('weight')
print weight_nodes # result is an empty list
print weight_value_elem # Location of element 'weight' is listed
for weight_node_loc in current_xml_tree.iter('weight'):
print "for-loop check : loop has been entered"
current_xml_tree.getroot().remove(weight_value_elem)
print "for-loop has been processed"
print "Weight line removed from ", filename_path
# Write changes to XML File:
current_xml_tree.write(filename_path)
I have looked at many pages, but this one: http://www.cmi.ac.in/~madhavan/courses/prog2-2015/docs/python-3.4.2-docs-html/library/xml.etree.elementtree.html seems quite helpful, but have reached a point where I am stuck. Thank you all in advance!
I come from a finite element background, where nodes are understood as part of an element, defining portions / corner boundaries of what creates an element. However, am I wrong in thinking the terminology is used differently here so that nodes are not a subset of elements? Are the two terms still related in a similar way?
Removing an element from a tree, regardless of its location in the tree, is needlessly complicated by the ElementTree API. Specifically, no element knows its own parent, so we have to discover that relationship "by hand."
from xml.etree import ElementTree
XML = '''
<XML_level_1 created="2014-08-19 16:55:02" userID="User@company">
<XML_level_2 manufacturer="company" number="store-25235">
<padUnits value="mm" />
<partDescription value="Part description explained here" />
<weight value="5.2" />
</XML_level_2>
</XML_level_1>
'''
# parse the XML into a tree
root = ElementTree.XML(XML)
# Alternatively, parse the XML that lives in 'filename_path'
# tree = ElementTree.parse(filename_path)
# root = tree.getroot()
# Find the parent element of each "weight" element, using XPATH
for parent in root.findall('.//weight/..'):
# Find each weight element
for element in parent.findall('weight'):
# Remove the weight element from its parent element
parent.remove(element)
print ElementTree.tostring(root)
If you can switch to lxml
, the loop is slightly less cumbersome:
for weight in tree.findall("//weight"):
weight.getparent().remove(weight)
As to your second question, the ElementTree documentation uses "node" more-or-less interchangably with "element." More specifically, it appears to use the word "node" to refer either to a Python object of type "Element" or the XML element to which such an object refers.