I am using the bert-for-tf2 library to do a Multi-Class Classification problem. I created the model but training throws the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-25-d9f382cba5d4> in <module>()
----> 1 model.fit([INPUT_IDS,INPUT_MASKS,INPUT_SEGS], list(train.SECTION))
5 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/data_adapter.py in
__init__(self, x, y, sample_weights, batch_size, epochs, steps, shuffle, **kwargs)
243 label, ", ".join([str(i.shape[0]) for i in nest.flatten(data)]))
244 msg += "Please provide data which shares the same first dimension."
--> 245 raise ValueError(msg)
246 num_samples = num_samples.pop()
247
ValueError: Data cardinality is ambiguous:
x sizes: 3
y sizes: 6102
Please provide data which shares the same first dimension.
I am referring the medium article called Simple BERT using TensorFlow 2.0 The git repo for the library bert-for-tf2 can be found here.
Please find the entire code here.
Here is a link to my colab notebook
Really appreciate your help!
Had the same issue, dunno why number of inputs and outputs should be same, this error appears to be raised from one of the data adaptors when x.shape[0] != y.shape[0], in this case
x = [INPUT_IDS,INPUT_MASKS,INPUT_SEGS]
y = list(train.SECTION)
so instead of
model.fit([INPUT_IDS,INPUT_MASKS,INPUT_SEGS], list(train.SECTION))
try giving inputs and outputs in a dictionary with the layer names (check model summary (suitable names can be explicitly given as well)), worked for me
model.fit(
{
"input_word_ids": INPUT_IDS,
"input_mask": INPUT_MASKS,
"segment_ids": INPUT_SEGS,
},
{"dense_1": list(train.SECTION)}
)
please make sure that the inputs and outputs are numpy arrays, for ex: using np.asarray(), it looks for .shape attribute