I'm trying to create a simple weighted loss function.
Say, I have input dimensions 100 * 5, and output dimensions also 100 * 5. I also have a weight matrix of the same dimension.
Something like the following:
import numpy as np
train_X = np.random.randn(100, 5)
train_Y = np.random.randn(100, 5)*0.01 + train_X
weights = np.random.randn(*train_X.shape)
def custom_loss_1(y_true, y_pred):
return K.mean(K.abs(y_true-y_pred)*weights)
from keras.layers import Dense, Input
from keras import Model
import keras.backend as K
input_layer = Input(shape=(5,))
out = Dense(5)(input_layer)
model = Model(input_layer, out)
model.compile('adam','mean_absolute_error')
model.fit(train_X, train_Y, epochs=1)
model.compile('adam',custom_loss_1)
model.fit(train_X, train_Y, epochs=10)
It gives the following stack trace:
InvalidArgumentError (see above for traceback): Incompatible shapes: [32,5] vs. [100,5]
[[Node: loss_9/dense_8_loss/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](loss_9/dense_8_loss/Abs, loss_9/dense_8_loss/mul/y)]]
Where is the number 32 coming from?
def custom_loss_2(y_true, y_pred):
return K.mean(K.abs(y_true-y_pred)*K.ones_like(y_true))
This function seems to do the work. So, probably suggests that a Keras tensor as a weight matrix would work. So, I created another version of the loss function.
from functools import partial
def custom_loss_3(y_true, y_pred, weights):
return K.mean(K.abs(y_true-y_pred)*K.variable(weights, dtype=y_true.dtype))
cl3 = partial(custom_loss_3, weights=weights)
Fitting data using cl3 gives the same error as above.
InvalidArgumentError (see above for traceback): Incompatible shapes: [32,5] vs. [100,5]
[[Node: loss_11/dense_8_loss/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](loss_11/dense_8_loss/Abs, loss_11/dense_8_loss/Variable/read)]]
I wonder what I'm missing! I could have used the notion of sample_weight in Keras; but then I'd have to reshape my inputs to a 3d vector.
I thought that this custom loss function should really have been trivial.
In model.fit
the batch size is 32 by default, that's where this number is coming from. Here's what's happening:
In custom_loss_1
the tensor K.abs(y_true-y_pred)
has shape (batch_size=32, 5)
, while the numpy array weights
has shape (100, 5)
. This is an invalid multiplication, since the dimensions don't agree and broadcasting can't be applied.
In custom_loss_2
this problem doesn't exist because you're multiplying 2 tensors with the same shape (batch_size=32, 5)
.
In custom_loss_3
the problem is the same as in custom_loss_1
, because converting weights
into a Keras variable doesn't change their shape.
UPDATE: It seems you want to give a different weight to each element in each training sample, so the weights
array should have shape (100, 5)
indeed.
In this case, I would input your weights' array into your model and then use this tensor within the loss function:
import numpy as np
from keras.layers import Dense, Input
from keras import Model
import keras.backend as K
from functools import partial
def custom_loss_4(y_true, y_pred, weights):
return K.mean(K.abs(y_true - y_pred) * weights)
train_X = np.random.randn(100, 5)
train_Y = np.random.randn(100, 5) * 0.01 + train_X
weights = np.random.randn(*train_X.shape)
input_layer = Input(shape=(5,))
weights_tensor = Input(shape=(5,))
out = Dense(5)(input_layer)
cl4 = partial(custom_loss_4, weights=weights_tensor)
model = Model([input_layer, weights_tensor], out)
model.compile('adam', cl4)
model.fit(x=[train_X, weights], y=train_Y, epochs=10)