Merge two dataframes with multi-index

Andreuccio picture Andreuccio · Oct 3, 2017 · Viewed 11.7k times · Source

I have seen several posts about this but I could not get my head around how merge, join and concat would deal with this. How can I merge two dataframes to find matching indexes?

in:

import pandas as pd
import numpy as np
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
index_arrays = [np.array(['first', 'first', 'second', 'second']), np.array(['one','two','one','two'])]
df1 = pd.DataFrame([row_x1,row_x2,row_x3,row_x4], columns=list('ABC'), index=index_arrays)
print(df1)

out:

             A   B   C
first  one  a1  b1  c1
       two  a2  b2  c2
second one  a3  b3  c3
       two  a4  b4  c4

in:

row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
df2 = pd.DataFrame([row_y1,row_y2], columns=list('DEF'), index=['first','second'])
print(df2)

out

         D   E   F
first   d1  e1  f1
second  d2  e2  f2

in other words, how can I merge them to achieve df3 (as follows)?

in

row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']

row_z1 = row_x1 + row_y1
row_z2 = row_x2 + row_y1
row_z3 = row_x3 + row_y2
row_z4 = row_x4 + row_y2

df3 = pd.DataFrame([row_z1,row_z2,row_z3,row_z4], columns=list('ABCDEF'), index=index_arrays)
print(df3)

out

             A   B   C   D   E   F
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2

Answer

piRSquared picture piRSquared · Oct 3, 2017

Option 1
Use pd.DataFrame.reindex + pd.DataFrame.join
reindex has a convenient level parameter that allows you to expand on the index levels not present.

df1.join(df2.reindex(df1.index, level=0))

             A   B   C   D   E   F
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2

Option 2
You can rename your axes and join will work

df1.rename_axis(['a', 'b']).join(df2.rename_axis('a'))

             A   B   C   D   E   F
a      b                          
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2

You can follow that up with another rename_axis to get desired results

df1.rename_axis(['a', 'b']).join(df2.rename_axis('a')).rename_axis([None, None])

             A   B   C   D   E   F
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2