what is dimensionality in word embeddings?

manoveg picture manoveg · Jul 30, 2017 · Viewed 14.1k times · Source

I want to understand what is meant by "dimensionality" in word embeddings.

When I embed a word in the form of a matrix for NLP tasks, what role does dimensionality play? Is there a visual example which can help me understand this concept?

Answer

jabalazs picture jabalazs · Jun 19, 2018

Answer

A Word Embedding is just a mapping from words to vectors. Dimensionality in word embeddings refers to the length of these vectors.

Additional Info

These mappings come in different formats. Most pre-trained embeddings are available as a space-separated text file, where each line contains a word in the first position, and its vector representation next to it. If you were to split these lines, you would find out that they are of length 1 + dim, where dim is the dimensionality of the word vectors, and 1 corresponds to the word being represented. See the GloVe pre-trained vectors for a real example.

For example, if you download glove.twitter.27B.zip, unzip it, and run the following python code:

#!/usr/bin/python3

with open('glove.twitter.27B.50d.txt') as f:
    lines = f.readlines()
lines = [line.rstrip().split() for line in lines]

print(len(lines))          # number of words (aka vocabulary size)
print(len(lines[0]))       # length of a line
print(lines[130][0])       # word 130
print(lines[130][1:])      # vector representation of word 130
print(len(lines[130][1:])) # dimensionality of word 130

you would get the output

1193514
51
people
['1.4653', '0.4827', ..., '-0.10117', '0.077996']  # shortened for illustration purposes
50

Somewhat unrelated, but equally important, is that lines in these files are sorted according to the word frequency found in the corpus in which the embeddings were trained (most frequent words first).


You could also represent these embeddings as a dictionary where the keys are the words and the values are lists representing word vectors. The length of these lists would be the dimensionality of your word vectors.

A more common practice is to represent them as matrices (also called lookup tables), of dimension (V x D), where V is the vocabulary size (i.e., how many words you have), and D is the dimensionality of each word vector. In this case you need to keep a separate dictionary mapping each word to its corresponding row in the matrix.

Background

Regarding your question about the role dimensionality plays, you'll need some theoretical background. But in a few words, the space in which words are embedded presents nice properties that allow NLP systems to perform better. One of these properties is that words that have similar meaning are spatially close to each other, that is, have similar vector representations, as measured by a distance metric such as the Euclidean distance or the cosine similarity.

You can visualize a 3D projection of several word embeddings here, and see, for example, that the closest words to "roads" are "highways", "road", and "routes" in the Word2Vec 10K embedding.

For a more detailed explanation I recommend reading the section "Word Embeddings" of this post by Christopher Olah.

For more theory on why using word embeddings, which are an instance of distributed representations, is better than using, for example, one-hot encodings (local representations), I recommend reading the first sections of Distributed Representations by Geoffrey Hinton et al.