I have two pandas dataframes d1
and d2
that look like these:
d1
looks like:
output value1 value2 value2
1 100 103 87
1 201 97.5 88.9
1 144 54 85
d2
looks like:
output value1 value2 value2
0 100 103 87
0 201 97.5 88.9
0 144 54 85
0 100 103 87
0 201 97.5 88.9
0 144 54 85
The column output has a value of 1 for all rows in d1 and 0 for all rows in d2. It's a grouping variable. I need to find euclidean distance between each rows of d1 and d2 (not within d1 or d2). If d1
has m
rows and d2
has n
rows, then the distance matrix will have m
rows and n columns
By using scipy.spatial.distance.cdist
:
import scipy
ary = scipy.spatial.distance.cdist(d1.iloc[:,1:], d2.iloc[:,1:], metric='euclidean')
pd.DataFrame(ary)
Out[1274]:
0 1 2 3 4 5
0 0.000000 101.167485 65.886266 0.000000 101.167485 65.886266
1 101.167485 0.000000 71.808495 101.167485 0.000000 71.808495
2 65.886266 71.808495 0.000000 65.886266 71.808495 0.000000