Python / Minidom : Iterate on a NodeList

DenisB picture DenisB · Sep 20, 2012 · Viewed 8.6k times · Source

I'm making a Python program that parse XML files. I need to iterate over NodeList, but I have an issue doing it with the "for node in NodeList" syntax.

Here is a sample of code :

docToInclude = parse(node.getAttribute("file"))

print ("childNode count : " , len(docToInclude.documentElement.childNodes))
print ("childNodes : " , docToInclude.documentElement.childNodes)
print("")

for i in range(0, len(docToInclude.documentElement.childNodes)):
    print ("i  = ", i , "nodeName = " + docToInclude.documentElement.childNodes[i].nodeName)

print("")

for elementNode in docToInclude.documentElement.childNodes :
    print ("node name : " ,  elementNode.nodeName)
    node.parentNode.insertBefore(elementNode, insertPosition)

Here is the output :

childNode count :  3
childNodes :  [<DOM Text node "'\n\n\t'">, <DOM Element: messageList at 0x3a4e570>, <DOM Text node "'\n\n'">]

i  =  0 nodeName = #text
i  =  1 nodeName = messageList
i  =  2 nodeName = #text

node name :  #text
node name :  #text

If I iterate with the for node in NodeList syntax, an element is skipped. Do you have any idea of this problem origin ?

Answer

Martijn Pieters picture Martijn Pieters · Sep 20, 2012

You are moving elements out of the childNodes while iterating over them. This changes the childNodes list:

>>> lst = [1, 2, 3]
>>> for i, elem in enumerate(lst):
...    print i, elem
...    del lst[i]
...    
0 1
1 3

You'll have to iterate over a copy of the list instead; here I create a copy of the list by using the [:] slice notation:

for elementNode in docToInclude.documentElement.childNodes[:]:
    print ("node name : " ,  elementNode.nodeName)
    node.parentNode.insertBefore(elementNode, insertPosition) 

Do yourself a big favour though and use the ElementTree API instead; that API is far pythononic and easier to use than the XML DOM API:

from xml.etree import ElementTree as ET

etree = ET.fromstring(data)
for element in etree.findall('messageList'):
    print element