How can I calculate the Jaccard Similarity of two lists containing strings in Python?

Aventinus picture Aventinus · Oct 27, 2017 · Viewed 54.1k times · Source

I have two lists with usernames and I want to calculate the Jaccard similarity. Is it possible?

This thread shows how to calculate the Jaccard Similarity between two strings, however I want to apply this to two lists, where each element is one word (e.g., a username).

Answer

Aventinus picture Aventinus · Oct 30, 2017

I ended up writing my own solution after all:

def jaccard_similarity(list1, list2):
    intersection = len(list(set(list1).intersection(list2)))
    union = (len(list1) + len(list2)) - intersection
    return float(intersection) / union