How do I calculate the cosine similarity of two vectors?

shiva picture shiva · Feb 6, 2009 · Viewed 63k times · Source

How do I find the cosine similarity between vectors?

I need to find the similarity to measure the relatedness between two lines of text.

For example, I have two sentences like:

system for user interface

user interface machine

… and their respective vectors after tF-idf, followed by normalisation using LSI, for example [1,0.5] and [0.5,1].

How do I measure the smiliarity between these vectors?

Answer

Alphaaa picture Alphaaa · Apr 7, 2014

If you want to avoid relying on third-party libraries for such a simple task, here is a plain Java implementation:

public static double cosineSimilarity(double[] vectorA, double[] vectorB) {
    double dotProduct = 0.0;
    double normA = 0.0;
    double normB = 0.0;
    for (int i = 0; i < vectorA.length; i++) {
        dotProduct += vectorA[i] * vectorB[i];
        normA += Math.pow(vectorA[i], 2);
        normB += Math.pow(vectorB[i], 2);
    }   
    return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}

Note that the function assumes that the two vectors have the same length. You may want to explictly check it for safety.