I would like to implement word2vec algorithm in keras, Is this possible? How can I fit the model? Should I use custom loss function?
Is this possible?
You've already answered it yourself: yes. In addition to word2veckeras
, which uses gensim
, here's another CBOW implementation that doesn't have extra dependencies (just in case, I'm not affiliated with this repo). You can use them as examples.
How can I fit the model?
Since the training data is the large corpus of sentences, the most convenient method is model.fit_generator
, which "fits the model on data generated batch-by-batch by a Python generator". The generator runs indefinitely yielding (word, context, target)
CBOW (or SG) tuples, but you manually specify sample_per_epoch
and nb_epoch
to limit the training. This way you decouple sentence analysis (tokenization, word index table, sliding window, etc) and actual keras model, plus save a lot of resources.
Should I use custom loss function?
CBOW minimizes the distance between the predicted and true distribution of the center word, so in the simplest form categorical_crossentropy
will do it.
If you implement negative sampling, which is a bit more complex, yet much more efficient, the loss function changes to binary_crossentropy
. Custom loss function is unnecessary.
For anyone interested in details of math and probabilistic model, I highly recommend CS224D class by Stanford. Here is the lecture notes about word2vec, CBOW and Skip-Gram.
Another useful reference: word2vec implementation in pure numpy
and c
.