I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the word embedding (rows of input/hidden matrix) and the context embedding (columns of hidden/output matrix).
As outlined in this post there are at least three common ways to combine these two embedding vectors:
However, I couldn't find proper papers or reports on the best strategy. So my questions are:
Related (but unanswered) questions: