I tried to follow this.
But some how I wasted a lot of time ending up with nothing useful.
I just want to train a GloVe
model on my own corpus (~900Mb corpus.txt file).
I downloaded the files provided in the link above and compiled it using cygwin
(after editing the demo.sh file and changed it to VOCAB_FILE=corpus.txt
. should I leave CORPUS=text8
unchanged?)
the output was:
How can I used those files to load it as a GloVe
model on python?
You can do it using GloVe library:
Install it: pip install glove_python
Then:
from glove import Corpus, Glove
#Creating a corpus object
corpus = Corpus()
#Training the corpus to generate the co occurence matrix which is used in GloVe
corpus.fit(lines, window=10)
glove = Glove(no_components=5, learning_rate=0.05)
glove.fit(corpus.matrix, epochs=30, no_threads=4, verbose=True)
glove.add_dictionary(corpus.dictionary)
glove.save('glove.model')
Reference: word vectorization using glove