I've got a problem with strings that I get from one of my clients over xmlrpc. He sends me utf8 strings that are encoded twice :( so when I get them in python I have an unicode object that has to be decoded one more time, but obviously python doesn't allow that. I've noticed my client however I need to do quick workaround for now before he fixes it.
Raw string from tcp dump:
<string>Rafa\xc3\x85\xc2\x82</string>
this is converted into:
u'Rafa\xc5\x82'
The best we get is:
eval(repr(u'Rafa\xc5\x82')[1:]).decode("utf8")
This results in correct string which is:
u'Rafa\u0142'
this works however is ugly as hell and cannot be used in production code. If anyone knows how to fix this problem in more suitable way please write. Thanks, Chris
>>> s = u'Rafa\xc5\x82' >>> s.encode('raw_unicode_escape').decode('utf-8') u'Rafa\u0142' >>>