When I feed a utf-8 encoded xml to an ExpatParser instance:
def test(filename):
parser = xml.sax.make_parser()
with codecs.open(filename, 'r', encoding='utf-8') as f:
for line in f:
parser.feed(line)
...I get the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "test.py", line 72, in search_test
parser.feed(line)
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb4' in position 29: ordinal not in range(128)
I'm probably missing something obvious here. How do I change the parser's encoding from 'ascii' to 'utf-8'?
Your code fails in Python 2.6, but works in 3.0.
This does work in 2.6, presumably because it allows the parser itself to figure out the encoding (perhaps by reading the encoding optionally specified on the first line of the XML file, and otherwise defaulting to utf-8):
def test(filename):
parser = xml.sax.make_parser()
parser.parse(open(filename))