I have a dataframe that looks like the following:
import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)
This gives me:
Out[458]: df
BORDER HOUR1 HOUR2 HOUR3
2014-01-01 GERMANY 2 3 8
2014-01-01 FRANCE 2 3 8
2014-01-01 ITALY 2 3 8
2014-01-02 GERMANY 4 5 12
2014-01-02 FRANCE 4 5 12
2014-01-02 ITALY 4 5 12
2014-01-03 GERMANY 6 7 99
2014-01-03 FRANCE 6 7 99
2014-01-03 ITALY 6 7 99
I want the final dataframe to look something like:
HOUR GERMANY FRANCE ITALY
2014-01-01 1 2 2 2
2014-01-01 2 3 3 3
2014-01-01 3 8 8 8
2014-01-02 1 4 4 4
2014-01-02 2 5 5 5
2014-01-02 3 12 12 12
2014-01-03 1 6 6 6
2014-01-03 2 7 7 7
2014-01-03 3 99 99 99
I've done the following but I'm not quite there:
df['date_col'] = df.index
df2 = melt(df, id_vars=['date_col','BORDER'])
#Can I keep the same index after melt or do I have to set an index like below?
df2.set_index(['date_col', 'variable'], inplace=True, drop=True)
df2 = df2.sort()
df
Out[465]: df2
BORDER value
date_col variable
2014-01-01 HOUR1 GERMANY 2
HOUR1 FRANCE 2
HOUR1 ITALY 2
HOUR2 GERMANY 3
HOUR2 FRANCE 3
HOUR2 ITALY 3
HOUR3 GERMANY 8
HOUR3 FRANCE 8
HOUR3 ITALY 8
2014-01-02 HOUR1 GERMANY 4
HOUR1 FRANCE 4
HOUR1 ITALY 4
HOUR2 GERMANY 5
HOUR2 FRANCE 5
HOUR2 ITALY 5
HOUR3 GERMANY 12
HOUR3 FRANCE 12
HOUR3 ITALY 12
2014-01-03 HOUR1 GERMANY 6
HOUR1 FRANCE 6
HOUR1 ITALY 6
HOUR2 GERMANY 7
HOUR2 FRANCE 7
HOUR2 ITALY 7
HOUR3 GERMANY 99
HOUR3 FRANCE 99
HOUR3 ITALY 99
I thought I could unstack df2 to get something that resembles my final dataframe but I get all sorts of errors. I have also tried to pivot this dataframe but can't quite get what I want.
We want values (e.g. 'GERMANY'
) to become column names, and column names (e.g. 'HOUR1'
) to become values -- a swap of sorts.
The stack
method turns column names into index values, and
the unstack
method turns index values into column names.
So by shifting the values into the index, we can use stack
and unstack
to perform the swap.
import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)
df = df.set_index(['BORDER'], append=True)
df.columns.name = 'HOUR'
df = df.unstack('BORDER')
df = df.stack('HOUR')
df = df.reset_index('HOUR')
df['HOUR'] = df['HOUR'].str.replace('HOUR', '').astype('int')
print(df)
yields
BORDER HOUR FRANCE GERMANY ITALY
2014-01-01 1 2 2 2
2014-01-01 2 3 3 3
2014-01-01 3 8 8 8
2014-01-02 1 4 4 4
2014-01-02 2 5 5 5
2014-01-02 3 12 12 12
2014-01-03 1 6 6 6
2014-01-03 2 7 7 7
2014-01-03 3 99 99 99