I'm making a Python program that parse XML files. I need to iterate over NodeList, but I have an issue doing it with the "for node in NodeList" syntax.
Here is a sample of code :
docToInclude = parse(node.getAttribute("file"))
print ("childNode count : " , len(docToInclude.documentElement.childNodes))
print ("childNodes : " , docToInclude.documentElement.childNodes)
print("")
for i in range(0, len(docToInclude.documentElement.childNodes)):
print ("i = ", i , "nodeName = " + docToInclude.documentElement.childNodes[i].nodeName)
print("")
for elementNode in docToInclude.documentElement.childNodes :
print ("node name : " , elementNode.nodeName)
node.parentNode.insertBefore(elementNode, insertPosition)
Here is the output :
childNode count : 3
childNodes : [<DOM Text node "'\n\n\t'">, <DOM Element: messageList at 0x3a4e570>, <DOM Text node "'\n\n'">]
i = 0 nodeName = #text
i = 1 nodeName = messageList
i = 2 nodeName = #text
node name : #text
node name : #text
If I iterate with the for node in NodeList syntax, an element is skipped. Do you have any idea of this problem origin ?
You are moving elements out of the childNodes
while iterating over them. This changes the childNodes
list:
>>> lst = [1, 2, 3]
>>> for i, elem in enumerate(lst):
... print i, elem
... del lst[i]
...
0 1
1 3
You'll have to iterate over a copy of the list instead; here I create a copy of the list by using the [:]
slice notation:
for elementNode in docToInclude.documentElement.childNodes[:]:
print ("node name : " , elementNode.nodeName)
node.parentNode.insertBefore(elementNode, insertPosition)
Do yourself a big favour though and use the ElementTree API instead; that API is far pythononic and easier to use than the XML DOM API:
from xml.etree import ElementTree as ET
etree = ET.fromstring(data)
for element in etree.findall('messageList'):
print element