Unable to train my keras model : (Data cardinality is ambiguous:)

Amal Vijayan picture Amal Vijayan · Dec 3, 2019 · Viewed 14.9k times · Source

I am using the bert-for-tf2 library to do a Multi-Class Classification problem. I created the model but training throws the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-d9f382cba5d4> in <module>()
----> 1 model.fit([INPUT_IDS,INPUT_MASKS,INPUT_SEGS], list(train.SECTION))

5 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/data_adapter.py in 
__init__(self, x, y, sample_weights, batch_size, epochs, steps, shuffle, **kwargs)
243             label, ", ".join([str(i.shape[0]) for i in nest.flatten(data)]))
244       msg += "Please provide data which shares the same first dimension."
--> 245       raise ValueError(msg)
246     num_samples = num_samples.pop()
247 

ValueError: Data cardinality is ambiguous:
x sizes: 3
y sizes: 6102
Please provide data which shares the same first dimension.

I am referring the medium article called Simple BERT using TensorFlow 2.0 The git repo for the library bert-for-tf2 can be found here.

Please find the entire code here.

Here is a link to my colab notebook

Really appreciate your help!

Answer

Yoganand picture Yoganand · Dec 10, 2019

Had the same issue, dunno why number of inputs and outputs should be same, this error appears to be raised from one of the data adaptors when x.shape[0] != y.shape[0], in this case

x = [INPUT_IDS,INPUT_MASKS,INPUT_SEGS]
y = list(train.SECTION)

so instead of

model.fit([INPUT_IDS,INPUT_MASKS,INPUT_SEGS], list(train.SECTION))

try giving inputs and outputs in a dictionary with the layer names (check model summary (suitable names can be explicitly given as well)), worked for me

model.fit(
     {
     "input_word_ids": INPUT_IDS,
     "input_mask": INPUT_MASKS,
     "segment_ids": INPUT_SEGS,
     },
    {"dense_1": list(train.SECTION)}
)

please make sure that the inputs and outputs are numpy arrays, for ex: using np.asarray(), it looks for .shape attribute