In spacy, how to use your own word2vec model created in gensim?

Subigya Upadhyay picture Subigya Upadhyay · May 22, 2018 · Viewed 7.6k times · Source

I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly how.

gensimmodel
Out[252]:
<gensim.models.word2vec.Word2Vec at 0x110b24b70>

import spacy
spacy.load(gensimmodel)

OSError: [E050] Can't find model 'Word2Vec(vocab=250, size=1000, alpha=0.025)'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Answer

hbot picture hbot · Nov 7, 2018

Train and save your model in plain-text format:

from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec

path = get_tmpfile("./data/word2vec.model")

model = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
model.wv.save_word2vec_format("./data/word2vec.txt")

Gzip the text file:

gzip word2vec.txt

Which produces a word2vec.txt.gz file.

Run the following command:

python -m spacy init-model en ./data/spacy.word2vec.model --vectors-loc word2vec.txt.gz

Load the vectors using:

nlp = spacy.load('./data/spacy.word2vec.model/')