Removing u in list

Brian Li picture Brian Li · Mar 19, 2012 · Viewed 165.5k times · Source

I have read up on remove the character 'u' in a list but I am using google app engine and it does not seem to work!

def get(self):
    players = db.GqlQuery("SELECT * FROM Player")
    print players
    playerInfo  = {}

    test = []

    for player in players:
        email =  player.email
        gem =  str(player.gem)
        a = "{email:"+email + ",gem:" +gem +"}"

        test.append(a)


    ast.literal_eval(json.dumps(test))
    print test

Final output:

[u'{email:[email protected],gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test,gem:0}', u'{email:test1,gem:0}']

Answer

unwind picture unwind · Mar 19, 2012

That 'u' is part of the external representation of the string, meaning it's a Unicode string as opposed to a byte string. It's not in the string, it's part of the type.

As an example, you can create a new Unicode string literal by using the same synax. For instance:

>>> sandwich = u"smörgås"
>>> sandwich
u'sm\xf6rg\xe5s'

This creates a new Unicode string whose value is the Swedish word for sandwich. You can see that the non-English characters are represented by their Unicode code points, ö is \xf6 and å is \xe5. The 'u' prefix appears just like in your example to signify that this string holds Unicode text.

To get rid of those, you need to encode the Unicode string into some byte-oriented representation, such as UTF-8. You can do that with e.g.:

>>> sandwich.encode("utf-8")
'sm\xc3\xb6rg\xc3\xa5s'

Here, we get a new string without the prefix 'u', since this is a byte string. It contains the bytes representing the characters of the Unicode string, with the Swedish characters resulting in multiple bytes due to the wonders of the UTF-8 encoding.