Python XML Parsing without root

sgp picture sgp · May 27, 2014 · Viewed 7.2k times · Source

I wanted to parse a fairly huge xml-like file which doesn't have any root element. The format of the file is:

<tag1>
<tag2>
</tag2>
</tag1>

<tag1>
<tag3/>
</tag1>

I tried using Element-Tree but it returned a "no root" error. Is there any other python library which can be used for parsing this file? Thanks in advance! :)

PS: I tried adding an extra tag to wrap the entire file and then parse it using Element-Tree. However, I would like to use some more efficient method, in which I would not need to alter the original xml file.

Answer

falsetru picture falsetru · May 27, 2014

ElementTree.fromstringlist accepts an iterable (that yields strings).

Using it with itertools.chain:

import itertools
import xml.etree.ElementTree as ET
# import xml.etree.cElementTree as ET

with open('xml-like-file.xml') as f:
    it = itertools.chain('<root>', f, '</root>')
    root = ET.fromstringlist(it)

# Do something with `root`
root.find('.//tag3')