I need to calculate the cosine similarity between two lists, let's say for example list 1 which is dataSetI
and list 2 which is dataSetII
. I cannot use anything such as numpy or a statistics module. I must use common modules (math, etc) (and the least modules as possible, at that, to reduce time spent).
Let's say dataSetI
is [3, 45, 7, 2]
and dataSetII
is [2, 54, 13, 15]
. The length of the lists are always equal.
Of course, the cosine similarity is between 0 and 1, and for the sake of it, it will be rounded to the third or fourth decimal with format(round(cosine, 3))
.
Thank you very much in advance for helping.
You should try SciPy. It has a bunch of useful scientific routines for example, "routines for computing integrals numerically, solving differential equations, optimization, and sparse matrices." It uses the superfast optimized NumPy for its number crunching. See here for installing.
Note that spatial.distance.cosine computes the distance, and not the similarity. So, you must subtract the value from 1 to get the similarity.
from scipy import spatial
dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
result = 1 - spatial.distance.cosine(dataSetI, dataSetII)