Assuming the following DataFrame:
key.0 key.1 key.2 topic
1 abc def ghi 8
2 xab xcd xef 9
How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:
topic key
1 8 abc
2 8 def
3 8 ghi
4 9 xab
5 9 xcd
6 9 xef
Note that the number of key.N columns is variable on some external N.
You can melt your dataframe:
>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')
topic variable key
0 8 key.0 abc
1 9 key.0 xab
2 8 key.1 def
3 9 key.1 xcd
4 8 key.2 ghi
5 9 key.2 xef
It also gives you the source of the key.
From v0.20
, melt
is a first class function of the pd.DataFrame
class:
>>> df.melt('topic', value_name='key').drop('variable', 1)
topic key
0 8 abc
1 9 xab
2 8 def
3 9 xcd
4 8 ghi
5 9 xef