decoding and encoding Hebrew string in Python

user1767774 picture user1767774 · Apr 24, 2015 · Viewed 29.1k times · Source

I am trying to encode and decode the Hebrew string "שלום". However, after encoding, I get gibberish:

>>> word = "שלום"
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
׳©׳׳•׳

How should I do it propely?

Thanks.

Answer

jonhurlock picture jonhurlock · Apr 24, 2015

You'll have to make sure you have the right encoding in your environment (shell or script). If you're using a script include the following:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

To make sure your environment knows you're using UTF-8. You may find that your shell terminal will accept only ASCII, so make sure it is able to support UTF-8.

>>> word = "שלום"
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
שלום
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
שלום
>>>