Is there pre-trained doc2vec model?

gensim doc2vec

Idriss Brahimi · Jul 2, 2018 · Viewed 10.3k times · Source

Is there a pre-trained doc2vec model with a large data set, like Wikipedia or similar?

Answer

I don't know of any good one. There's one linked from this project, but:

it's based on a custom fork from an older gensim, so won't load in recent code
it's not clear what parameters or data it was trained with, and the associated paper may have made uninformed choices about the effects of parameters
it doesn't appear to be the right size to include actual doc-vectors for either Wikipedia articles (4-million-plus) or article paragraphs (tens-of-millions), or a significant number of word-vectors, so it's unclear what's been discarded

While it takes a long time and significant amount of working RAM, there is a Jupyter notebook demonstrating the creation of a Doc2Vec model from Wikipedia included in gensim:

https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-wikipedia.ipynb

So, I would recommend fixing the mistakes in your attempt. (And, if you succeed in creating a model, and want to document it for others, you could upload it somewhere for others to re-use.)

Is there pre-trained doc2vec model?

Answer

Related questions