How do I find the cosine similarity between vectors?
I need to find the similarity to measure the relatedness between two lines of text.
For example, I have two sentences like:
system for user interface
user interface machine
… and their respective vectors after tF-idf, followed by normalisation using LSI, for example
[1,0.5]
and [0.5,1]
.
How do I measure the smiliarity between these vectors?
If you want to avoid relying on third-party libraries for such a simple task, here is a plain Java implementation:
public static double cosineSimilarity(double[] vectorA, double[] vectorB) {
double dotProduct = 0.0;
double normA = 0.0;
double normB = 0.0;
for (int i = 0; i < vectorA.length; i++) {
dotProduct += vectorA[i] * vectorB[i];
normA += Math.pow(vectorA[i], 2);
normB += Math.pow(vectorB[i], 2);
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
Note that the function assumes that the two vectors have the same length. You may want to explictly check it for safety.