WARNING: WARNING:tensorflow:Model was constructed with shape (None, 150) , but it was called on an input with incompatible shape (None, 1)

DolceVita34 picture DolceVita34 · May 7, 2020 · Viewed 12.8k times · Source

So I'm trying to build a word embedding model but I keep getting this error. During training, the accuracy does not change and the val_loss remains "nan"

The raw shape of the data is

x.shape, y.shape
((94556,), (94556, 2557))

Then I reshape it so:

xr= np.asarray(x).astype('float32').reshape((-1,1))
yr= np.asarray(y).astype('float32').reshape((-1,1))
((94556, 1), (241779692, 1))

Then I run it through my model

model = Sequential()
model.add(Embedding(2557, 64, input_length=150, embeddings_initializer='glorot_uniform'))
model.add(Flatten())
model.add(Reshape((64,), input_shape=(94556, 1)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(1, activation='relu'))
# compile the mode
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# summarize the model
print(model.summary())
plot_model(model, show_shapes = True, show_layer_names=False)

After training, I get a constant accuracy and a val_loss nan for every epoch

history=model.fit(xr, yr, epochs=20, batch_size=32, validation_split=3/9)

Epoch 1/20
WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
1960/1970 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.9996WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 2/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 3/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 4/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 5/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 6/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 7/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 8/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 9/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 10/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 11/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 12/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 13/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 14/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 15/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 16/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 17/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 18/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 19/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 20/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996

I think it has to do whit the input/output shape but I'm not certain. I tried modifying the model in various ways, adding layers/ removing layers/ different optimizers/ different batch sizes and nothing worked so far.

Answer

Tawy picture Tawy · May 7, 2020

Ok so, here is what I understood, correct me if I'm wrong:

  • x contains 94556 integers, each being the index of one out of 2557 words.
  • y contains 94556 vectors of 2557 integers, each containing also the index of one word, but this time it is a one-hot encoding instead of a categorical encoding.
  • Finally, a corresponding pair of words from x and y represents two words that are close by in the original text.

If I am correct so far, then the following runs correctly:

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *

x = np.random.randint(0,2557,94556)
y = np.eye((2557))[np.random.randint(0,2557,94556)]
xr = x.reshape((-1,1))


print("x.shape: {}\nxr.shape:{}\ny.shape: {}".format(x.shape, xr.shape, y.shape))


model = Sequential()
model.add(Embedding(2557, 64, input_length=1, embeddings_initializer='glorot_uniform'))
model.add(Reshape((64,)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(2557, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

history=model.fit(xr, y, epochs=20, batch_size=32, validation_split=3/9)

The most import modifications:

  • The y reshaping was losing the relationship between elements from x and y.
  • The input_length in the Embedding layer should correspond to the second dimension of xr.
  • The output of the last layer from the network should be the same dimension as the second dimension of y.

I am actually surprised the code ran without crashing.

Finally, from my research, it seems that people are not training skipgrams like this in practice, but rather they are trying to predict whether a training example is correct (the two words are close by) or not. Maybe this is the reason you came up with an output of dimension one.

Here is a model inspired from https://github.com/PacktPublishing/Deep-Learning-with-Keras/blob/master/Chapter05/keras_skipgram.py :

word_model = Sequential()
word_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
word_model.add(Reshape((embed_size,)))

context_model = Sequential()
context_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
context_model.add(Reshape((64,)))

model = Sequential()
model.add(Merge([word_model, context_model], mode="dot", dot_axes=0))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))

In that case, you would have 3 vectors, all from the same size (94556, 1) (or probably even bigger than 94556, since you might have to generate additional negative samples):

  • x containing integers from 0 to 2556
  • y containing integers from 0 to 2556
  • output containing 0s and 1s, whether each pair from x and y is a negative or a positive example

and the training would look like:

history = model.fit([x, y], output, epochs=20, batch_size=32, validation_split=3/9)