I have an XML file which looks like this:
<encspot>
<file>
<Name>some filename.mp3</Name>
<Encoder>Gogo (after 3.0)</Encoder>
<Bitrate>131</Bitrate>
<Mode>joint stereo</Mode>
<Length>00:02:43</Length>
<Size>5,236,644</Size>
<Frame>no</Frame>
<Quality>good</Quality>
<Freq.>44100</Freq.>
<Frames>6255</Frames>
..... and so forth ......
</file>
<file>....</file>
</encspot>
I want to read it into a python object, something like a list of dictionaries. Because the markup is absolutely fixed, I'm tempted to use regex (I'm quite good at using those). However, I thought I'll check if someone knows how to easily avoid regexes here. I don't have much experience with SAX or other parsing, though, but I'm willing to learn.
I'm looking forward to be shown how this is done quickly without regexes in Python. Thanks for your help!
My beloved SD Chargers hat is off to you if you think a regex is easier than this:
#!/usr/bin/env python
import xml.etree.cElementTree as et
sxml="""
<encspot>
<file>
<Name>some filename.mp3</Name>
<Encoder>Gogo (after 3.0)</Encoder>
<Bitrate>131</Bitrate>
</file>
<file>
<Name>another filename.mp3</Name>
<Encoder>iTunes</Encoder>
<Bitrate>128</Bitrate>
</file>
</encspot>
"""
tree=et.fromstring(sxml)
for el in tree.findall('file'):
print '-------------------'
for ch in el.getchildren():
print '{:>15}: {:<30}'.format(ch.tag, ch.text)
print "\nan alternate way:"
el=tree.find('file[2]/Name') # xpath
print '{:>15}: {:<30}'.format(el.tag, el.text)
Output:
-------------------
Name: some filename.mp3
Encoder: Gogo (after 3.0)
Bitrate: 131
-------------------
Name: another filename.mp3
Encoder: iTunes
Bitrate: 128
an alternate way:
Name: another filename.mp3
If your attraction to a regex is being terse, here is an equally incomprehensible bit of list comprehension to create a data structure:
[(ch.tag,ch.text) for e in tree.findall('file') for ch in e.getchildren()]
Which creates a list of tuples of the XML children of <file>
in document order:
[('Name', 'some filename.mp3'),
('Encoder', 'Gogo (after 3.0)'),
('Bitrate', '131'),
('Name', 'another filename.mp3'),
('Encoder', 'iTunes'),
('Bitrate', '128')]
With a few more lines and a little more thought, obviously, you can create any data structure that you want from XML with ElementTree. It is part of the Python distribution.
Edit
Code golf is on!
[{item.tag: item.text for item in ch} for ch in tree.findall('file')]
[ {'Bitrate': '131',
'Name': 'some filename.mp3',
'Encoder': 'Gogo (after 3.0)'},
{'Bitrate': '128',
'Name': 'another filename.mp3',
'Encoder': 'iTunes'}]
If your XML only has the file
section, you can choose your golf. If your XML has other tags, other sections, you need to account for the section the children are in and you will need to use findall
There is a tutorial on ElementTree at Effbot.org