How to get word vectors from a gensim Doc2Vec?

V Y picture V Y · May 20, 2016 · Viewed 7.1k times · Source

I trained a gensim.models.doc2vec.Doc2Vec model
d2v_model = Doc2Vec(sentences, size=100, window=8, min_count=5, workers=4) and I can get document vectors by docvec = d2v_model.docvecs[0]

How can I get word vectors from trained model ?

Answer

gojomo picture gojomo · May 28, 2016

Doc2Vec inherits from Word2Vec, and thus you can access word vectors the same as in Word2Vec, directly by indexing the model:

wv = d2v_model['apple']

Note, however, that a Doc2Vec training mode like pure DBOW (dm=0) doesn't need or create word vectors. (Pure DBOW still works pretty well and fast for many purposes!) If you do access word vectors from such a model, they'll just be the automatic randomly-initialized vectors, with no meaning.

Only when the Doc2Vec mode itself co-trains word-vectors, as in the DM mode (default dm=1) or when adding optional word-training to DBOW (dm=0, dbow_words=1), are word-vectors and doc-vectors both learned simultaneously.