Pandas DataFrame stack multiple column values into single column

borice picture borice · Dec 19, 2015 · Viewed 28.9k times · Source

Assuming the following DataFrame:

  key.0 key.1 key.2  topic
1   abc   def   ghi      8
2   xab   xcd   xef      9

How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:

   topic  key
1      8  abc
2      8  def
3      8  ghi
4      9  xab
5      9  xcd
6      9  xef

Note that the number of key.N columns is variable on some external N.

Answer

Alexander picture Alexander · Dec 19, 2015

You can melt your dataframe:

>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')

   topic variable  key
0      8    key.0  abc
1      9    key.0  xab
2      8    key.1  def
3      9    key.1  xcd
4      8    key.2  ghi
5      9    key.2  xef

It also gives you the source of the key.


From v0.20, melt is a first class function of the pd.DataFrame class:

>>> df.melt('topic', value_name='key').drop('variable', 1)

   topic  key
0      8  abc
1      9  xab
2      8  def
3      9  xcd
4      8  ghi
5      9  xef