How to display Chinese characters inside a pandas dataframe?

Daniel picture Daniel · Sep 3, 2016 · Viewed 12k times · Source

I can read a csv file in which there is a column containing Chinese characters (other columns are English and numbers). However, Chinese characters don't display correctly. see photo below

enter image description here

I loaded the csv file with pd.read_csv().

Either display(data06_16) or data06_16.head() won't display Chinese characters correctly.

I tried to add the following lines into my .bash_profile:

export LC_ALL=zh_CN.UTF-8
export LANG=zh_CN.UTF-8

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

but it doesn't help.

Also I have tried to add encoding arg to pd.read_csv():

pd.read_csv('data.csv', encoding='utf_8')
pd.read_csv('data.csv', encoding='utf_16')
pd.read_csv('data.csv', encoding='utf_32')

These won't work at all.

How can I display the Chinese characters properly?

Answer

Daniel picture Daniel · Sep 4, 2016

I just remembered that the source dataset was created using encoding='GBK', so I tried again using

data06_16 = pd.read_csv("../data/stocks1542monthly.csv", encoding="GBK")

Now, I can see all the Chinese characters.

Thanks guys!