Python 3 parse xml file with ElementTree

James picture James · Oct 18, 2018 · Viewed 9k times · Source

Help, I have the following XML file that I am trying to read and extract data from, below is an extract from the xml file,

<Variable name="Inboard_ED_mm" state="Output" type="double[]">17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154<Properties><Property name="index">25</Property><Property name="description"></Property><Property name="upperBound">0</Property><Property name="hasUpperBound">false</Property><Property name="lowerBound">0</Property><Property name="hasLowerBound">false</Property><Property name="units"></Property><Property name="enumeratedValues"></Property><Property name="enumeratedAliases"></Property><Property name="validity">true</Property><Property name="autoSize">true</Property><Property name="userSlices"></Property></Properties></Variable>

I am trying to extract the following, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154

I have worked through the example here, xml.etree.ElementTree — The ElementTree XML API and I can get the example to work, but when I modify the code for the above xml, the code returns nothing!

Here is my code,

import xml.etree.ElementTree as ET
work_dir = r"C:\Temp\APROCONE\Python"

with open(model.xml, 'rt') as f:
    tree = ET.parse(f)
    root = tree.getroot()

for Variable in root.findall('Variable'):
    type = Variable.find('type').text
    name = Variable.get('name')
    print(name, type)

Any ideas? Thanks in advance for any help.

Edit: Thanks to everyone who has commented. With with your advice I have had a play and a search and got the following code,

with open(os.path.join(work_dir, "output.txt"), "w") as f:
for child1Tag in root.getchildren():
    for child2Tag in child1Tag.getchildren():
        for child3Tag in child2Tag.getchildren():
            for child4Tag in child3Tag.getchildren():
                for child5Tag in child4Tag.getchildren():
                    name = child5Tag.get('name')
                    if name == "Inboard_ED_mm":
                        print(child5Tag.attrib, file=f)
                        print(name, file=f)
                        print(child5Tag.text, file=f)

To return the following,

{'name': 'Inboard_ED_mm', 'state': 'Output', 'type': 'double[]'}
Inboard_ED_mm
17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154, 17.154

I know, not the best code in the world, any ideas on how to streamline the code would very welcome.

Answer

AJNeufeld picture AJNeufeld · Oct 18, 2018

You say the above is an "extract" of the XML file. The structure of the XML is important. Does the above just sit inside the root node?

for Variable in root.findall('Variable'):
    print(Variable.get('name'), Variable.text)

Or does it exist somewhere deeper in the XML tree structure, at a known level?

for Variable in root.findall('Path/To/Variable'):
    print(Variable.get('name'), Variable.text)

Or does it exist at some unspecified deeper level in the XML tree structure?

for Variable in root.findall('.//Variable'):
    print(Variable.get('name'), Variable.text)

Demonstrating the last two:

>>> import xml.etree.ElementTree as ET
>>> src = """
<root>
 <SubNode>
  <Variable name='x'>17.154, ..., 17.154<Properties>...</Properties></Variable>
  <Variable name='y'>14.174, ..., 15.471<Properties>...</Properties></Variable>
 </SubNode>
</root>"""
>>> root = ET.fromstring(src)
>>> for Variable in root.findall('SubNode/Variable'):
        print(Variable.get('name'), Variable.text)


x 17.154, ..., 17.154
y 14.174, ..., 15.471
>>>
>>> for Variable in root.findall('.//Variable'):
        print(Variable.get('name'), Variable.text)


x 17.154, ..., 17.154
y 14.174, ..., 15.471

Update

Based on your new/clearer/updated question, you are looking for:

for child in root.findall("*/*/*/*/Variable[@name='Inboard_ED_mm']"):
    print(child.attrib, file=f)
    print(child.get('name'), file=f)
    print(child.text, file=f)

or

for child in root.findall(".//Variable[@name='Inboard_ED_mm']"):
    print(child.attrib, file=f)
    print(child.get('name'), file=f)
    print(child.text, file=f)

With the exact tagnames of tags 1 through 4 are, we could give you a more exact XPath, instead of relying on */*/*/*/.