How to use word2vec to calculate the similarity distance by giving 2 words?

zhfkt picture zhfkt · Feb 24, 2014 · Viewed 75.7k times · Source

Word2vec is a open source tool to calculate the words distance provided by Google. It can be used by inputting a word and output the ranked word lists according to the similarity. E.g.

Input:

france

Output:

            Word       Cosine distance

            spain              0.678515
          belgium              0.665923
      netherlands              0.652428
            italy              0.633130
      switzerland              0.622323
       luxembourg              0.610033
         portugal              0.577154
           russia              0.571507
          germany              0.563291
        catalonia              0.534176

However, what I need to do is to calculate the similarity distance by giving 2 words. If I give the 'france' and 'spain', how can I get the score 0.678515 without reading the whole words list by giving just 'france'.

Answer

Satarupa Guha picture Satarupa Guha · Aug 21, 2014

gensim has a Python implementation of Word2Vec which provides an in-built utility for finding similarity between two words given as input by the user. You can refer to the following:

  1. Intro: http://radimrehurek.com/gensim/models/word2vec.html
  2. Tutorial: http://radimrehurek.com/2014/02/word2vec-tutorial/

The syntax in Python for finding similarity between two words goes like this:

>> from gensim.models import Word2Vec
>> model = Word2Vec.load(path/to/your/model)
>> model.similarity('france', 'spain')