How do I check if a string is unicode or ascii?

TIMEX picture TIMEX · Feb 13, 2011 · Viewed 362.2k times · Source

What do I have to do in Python to figure out which encoding a string has?

Answer

Greg Hewgill picture Greg Hewgill · Feb 13, 2011

In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.

In Python 2, a string may be of type str or of type unicode. You can tell which using code something like this:

def whatisthis(s):
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

This does not distinguish "Unicode or ASCII"; it only distinguishes Python types. A Unicode string may consist of purely characters in the ASCII range, and a bytestring may contain ASCII, encoded Unicode, or even non-textual data.