Python equivalent of daisy() in the cluster package of R

Question 1

Python equivalent of daisy() in the cluster package of R

python r similarity categorical-data r-daisy

Zhubarb · Oct 15, 2014 · Viewed 11.5k times · Source

Answer

Answer

Just to implement a Gower function to use with pdist won´t be enough.

Internally the pdist makes several numerical transformations that will fail if you use a matrix with mixed data.

I implemented the Gower function, according the original paper, and the respective adptations necessary in the pdist module (I could not simply override the functions, because the defs in the pdist module are private).

The results I obtained with this so far are the same from R´s daisy function.

The source code is avilable at this jupyter notebook: https://sourceforge.net/projects/gower-distance-4python/files/

Question 2

I have a dataset that contains both categorical (nominal and ordinal) and numerical attributes. I want to calculate the (dis)similarity matrix across my observations using these mixed attributes. Using the daisy() function of the cluster package in R, I can easily get a dissimilarity matrix as follows:

if(!require("cluster")) { install.packages("cluster");  require("cluster") }
data(flower)
as.matrix(daisy(flower, metric = "gower"))

This uses the gower metric to deal with the nominal variables. Is there a Python equivalent of the daisy() function in R?

Or maybe any other module function that allows using the Gower metric or something similar to calculate the (dis)similarity matrix for a dataset with mixed (nominal, numeric) attributes?

Python equivalent of daisy() in the cluster package of R

Answer

Related questions