Create multiindex from existing dataframe

puifais picture puifais · Jun 8, 2017 · Viewed 13.4k times · Source

I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):

have

And this is what I want:

want

I have tried

newmulti = currentDataFrame.set_index(['user_id','account_num'])

But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!

Answer

Alexander picture Alexander · Jun 8, 2017

You could simply use groupby in this case, which will create the multi-index automatically when it sums the sales along the requested columns.

df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()

You should also be able to simply do this:

df.set_index(['user_id', 'account_num', 'dates'])

Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_num and date values but different sales figures) by summing them, which is why I recommended using groupby.

If you need the multi-index, you can simply access viat new_df.index where new_df is the new dataframe created from either of the two operations above.

And user_id will be level 0 and account_num will be level 1.