I have a two lists and I want to check the similarity between each words in the two list and find out the maximum similarity.Here is my code,
from nltk.corpus import wordnet
list1 = ['Compare', 'require']
list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write']
list = []
for word1 in list1:
for word2 in list2:
wordFromList1 = wordnet.synsets(word1)[0]
wordFromList2 = wordnet.synsets(word2)[0]
s = wordFromList1.wup_similarity(wordFromList2)
list.append(s)
print(max(list))
But this will result an error:
wordFromList2 = wordnet.synsets(word2)[0]
IndexError: list index out of range
Please help me to fix this.
Thanking you
You're getting an error if a synset list is empty, and you try to get the element at (non-existent) index zero. But why only check the zero'th element? If you want to check everything, try all pairs of elements in the returned synsets. You can use itertools.product()
to save yourself two for-loops:
from itertools import product
sims = []
for word1, word2 in product(list1, list2):
syns1 = wordnet.synsets(word1)
syns2 = wordnet.synsets(word2)
for sense1, sense2 in product(syns1, syns2):
d = wordnet.wup_similarity(sense1, sense2)
sims.append((d, syns1, syns2))
This is inefficient because the same synsets are looked up again and again, but it is the closest to the logic of your code. If you have enough data to make speed an issue, you can speed it up by collecting the synsets for all words in list1
and list2
once, and taking the product of the synsets.
>>> allsyns1 = set(ss for word in list1 for ss in wordnet.synsets(word))
>>> allsyns2 = set(ss for word in list2 for ss in wordnet.synsets(word))
>>> best = max((wordnet.wup_similarity(s1, s2) or 0, s1, s2) for s1, s2 in
product(allsyns1, allsyns2))
>>> print(best)
(0.9411764705882353, Synset('command.v.02'), Synset('order.v.01'))