Ways to calculate similarity

Question 1

Ways to calculate similarity

statistics social-networking data-mining pattern-recognition similarity

MarySheen · Jun 5, 2010 · Viewed 10k times · Source

Answer

Answer

Another way of computing (in R) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be of mixed types. The handling of nominal, ordinal, and (a)symmetric binary data is achieved by using the general dissimilarity coefficient of Gower (Gower, J. C. (1971) A general coefficient of similarity and some of its properties, Biometrics 27, 857–874). For more check out this on page 47. If x contains any columns of these data-types, Gower's coefficient will be used as the metric.

For example

x1 <- factor(c(10, 12, 25, 14, 29))
x2 <- factor(c("oily", "dry", "dry", "dry", "oily"))
x3 <- factor(c("medium", "short", "medium", "medium", "long"))
x4 <- factor(c("active outdoor lover", "TV junky", "TV junky", "active outdoor lover", "TV junky"))
x <- cbind(x1,x2,x3,x4)

library(cluster)
daisy(x, metric = "euclidean")

you'll get :

Dissimilarities :
         1        2        3        4
2 2.000000                           
3 3.316625 2.236068                  
4 2.236068 1.732051 1.414214         
5 4.242641 3.741657 1.732051 2.645751

If you are interested on a method for dimensionality reduction for categorical data (also a way to arrange variables into homogeneous clusters) check this

Question 2

I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes:

age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others.

Can anyone tell me how to go about this problem or point me to some resources?

Ways to calculate similarity

Answer

Related questions