word2vec - get nearest words

blue-sky picture blue-sky · Oct 16, 2016 · Viewed 14.6k times · Source

Reading the tensorflow word2vec model output how can I output the words related to a specific word ?

Reading the src : https://github.com/tensorflow/tensorflow/blob/r0.11/tensorflow/examples/tutorials/word2vec/word2vec_basic.py can view how the image is plotted.

But is there a data structure (e.g dictionary) created as part of training the model that allows to access nearest n words closest to given word ? For example if word2vec generated image :

enter image description here

image src: https://www.tensorflow.org/versions/r0.11/tutorials/word2vec/index.html

In this image the words 'to , he , it' are contained in same cluster, is there a function which takes as input 'to' and outputs 'he , it' (in this case n=2) ?

Answer

Steven Du picture Steven Du · Oct 20, 2016

This approach apply to word2vec in general. If you can save the word2vec in text/binary file like google/GloVe word vector. Then what you need is just the gensim.

To install:

Via github

Python code:

from gensim.models import Word2Vec

gmodel=Word2Vec.load_word2vec_format(fname)
ms=gmodel.most_similar('good',10)
for x in ms:
    print x[0],x[1]

However this will search all the words to give the results, there are approximate nearest neighbor (ANN) which will give you the result faster but with a trade off in accuracy.

In the latest gensim, annoy is used to perform the ANN, see this notebooks for more information.

Flann is another library for Approximate Nearest Neighbors.