I have seen several posts about this but I could not get my head around how merge, join and concat would deal with this. How can I merge two dataframes to find matching indexes?
in:
import pandas as pd
import numpy as np
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
index_arrays = [np.array(['first', 'first', 'second', 'second']), np.array(['one','two','one','two'])]
df1 = pd.DataFrame([row_x1,row_x2,row_x3,row_x4], columns=list('ABC'), index=index_arrays)
print(df1)
out:
A B C
first one a1 b1 c1
two a2 b2 c2
second one a3 b3 c3
two a4 b4 c4
in:
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
df2 = pd.DataFrame([row_y1,row_y2], columns=list('DEF'), index=['first','second'])
print(df2)
out
D E F
first d1 e1 f1
second d2 e2 f2
in other words, how can I merge them to achieve df3 (as follows)?
in
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
row_z1 = row_x1 + row_y1
row_z2 = row_x2 + row_y1
row_z3 = row_x3 + row_y2
row_z4 = row_x4 + row_y2
df3 = pd.DataFrame([row_z1,row_z2,row_z3,row_z4], columns=list('ABCDEF'), index=index_arrays)
print(df3)
out
A B C D E F
first one a1 b1 c1 d1 e1 f1
two a2 b2 c2 d1 e1 f1
second one a3 b3 c3 d2 e2 f2
two a4 b4 c4 d2 e2 f2
Option 1
Use pd.DataFrame.reindex
+ pd.DataFrame.join
reindex
has a convenient level
parameter that allows you to expand on the index levels not present.
df1.join(df2.reindex(df1.index, level=0))
A B C D E F
first one a1 b1 c1 d1 e1 f1
two a2 b2 c2 d1 e1 f1
second one a3 b3 c3 d2 e2 f2
two a4 b4 c4 d2 e2 f2
Option 2
You can rename your axes and join
will work
df1.rename_axis(['a', 'b']).join(df2.rename_axis('a'))
A B C D E F
a b
first one a1 b1 c1 d1 e1 f1
two a2 b2 c2 d1 e1 f1
second one a3 b3 c3 d2 e2 f2
two a4 b4 c4 d2 e2 f2
You can follow that up with another rename_axis
to get desired results
df1.rename_axis(['a', 'b']).join(df2.rename_axis('a')).rename_axis([None, None])
A B C D E F
first one a1 b1 c1 d1 e1 f1
two a2 b2 c2 d1 e1 f1
second one a3 b3 c3 d2 e2 f2
two a4 b4 c4 d2 e2 f2