Comparing string and unicode in Python 2.7.5

Kulawy Krul picture Kulawy Krul · Nov 14, 2013 · Viewed 17.5k times · Source

I wonder why when I make:

a = [u'k',u'ę',u'ą']

and then type:

'k' in a

I get True, while:

'ę' in a

will give me False?

It really gives me headache and it seems someone made this on purpose to make people mad...

Answer

aIKid picture aIKid · Nov 14, 2013

And why is this?

In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:

Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

However, in Python 3.x this doesn't appear, as all strings are unicode objects.

Solution?

You can either make the string unicode:

>>> u'ç' in a
True

Now, you're comparing both unicode objects, not unicode to string.

Or convert both to an encoding, for example utf-8 before comparing:

>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True

Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:

# -*- coding: utf-8 -*-

#the whole program

Hope this helps!