All nodeValue fields are None when parsing XML

The.Anti.9 picture The.Anti.9 · Jan 26, 2009 · Viewed 7.1k times · Source

I'm building a simple web-based RSS reader in Python, but I'm having trouble parsing the XML. I started out by trying some stuff in the Python command line.

>>> from xml.dom import minidom
>>> import urllib2 
>>> url ='http://www.digg.com/rss/index.xml'
>>> xmldoc = minidom.parse(urllib2.urlopen(url))
>>> channelnode = xmldoc.getElementsByTagName("channel")
>>> channelnode = xmldoc.getElementsByTagName("channel")
>>> titlenode = channelnode[0].getElementsByTagName("title")
>>> print titlenode[0]
<DOM Element: title at 0xb37440> 
>>> print titlenode[0].nodeValue 
None

I played around with this for a while, but the nodeValue of everything seems to be None. Yet if you look at the XML, there definitely are values there. What am I doing wrong?

Answer

unbeknown picture unbeknown · Jan 26, 2009

For RSS feeds you should try the Universal Feed Parser library. It simplifies the handling of RSS feeds immensly.

import feedparser
d = feedparser.parse('http://www.digg.com/rss/index.xml')
title = d.channel.title