I am trying to fine tune resnet 50 with keras. When I freeze all the layers in resnet50, everything works OK. However, I want to freeze some layers of resnet50, not all of them. But when I do this, I get some errors. Here is my code:
base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(80, activation="softmax"))
#this is where the error happens. The commented code works fine
"""
for layer in base_model.layers:
layer.trainable = False
"""
for layer in base_model.layers[:-26]:
layer.trainable = False
model.summary()
optimizer = Adam(lr=1e-4)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
callbacks = [
EarlyStopping(monitor='val_loss', patience=4, verbose=1, min_delta=1e-4),
ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=2, cooldown=2, verbose=1),
ModelCheckpoint(filepath='weights/renet50_best_weight.fold_' + str(fold_count) + '.hdf5', save_best_only=True,
save_weights_only=True)
]
model.load_weights(filepath="weights/renet50_best_weight.fold_1.hdf5")
model.fit_generator(generator=train_generator(), steps_per_epoch=len(df_train) // batch_size, epochs=epochs, verbose=1,
callbacks=callbacks, validation_data=valid_generator(), validation_steps = len(df_valid) // batch_size)
the error is as follows:
Traceback (most recent call last):
File "/home/jamesben/ai_challenger/src/train.py", line 184, in <module> model.load_weights(filepath="weights/renet50_best_weight.fold_" + str(fold_count) + '.hdf5')
File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 719, in load_weights topology.load_weights_from_hdf5_group(f, layers)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 3095, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2193, in batch_set_value get_session().run(assign_ops, feed_dict=feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 944, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (128,) for Tensor 'Placeholder_72:0', which has shape '(3, 3, 128, 128)'
can anyone give me some help on how many layer should I freeze with resnet50?
When using load_weights()
and save_weights()
with a nested model, it's very easy to get an error if the trainable
settings are not the same.
To solve the error, make sure you freeze the same layers before calling model.load_weights()
. That is, if the weight file is saved with all layers frozen, the procedure will be:
base_model
base_model.layers[-26:]
)For example,
base_model = ResNet50(include_top=False, input_shape=(224, 224, 3))
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(80, activation="softmax"))
for layer in base_model.layers:
layer.trainable = False
model.load_weights('all_layers_freezed.h5')
for layer in base_model.layers[-26:]:
layer.trainable = True
When you call model.load_weights()
, (roughly) the weight for each layer is loaded by the following steps (in the function load_weights_from_hdf5_group()
in topology.py):
layer.weights
to get the weight tensorsK.batch_set_value()
to assign the weight values to the weight tensorsIf your model is a nested model, you have to be careful about trainable
because of Step 1.
I'll use an example to explain it. For the same model as above, model.summary()
gives:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resnet50 (Model) (None, 1, 1, 2048) 23587712
_________________________________________________________________
flatten_10 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_5 (Dense) (None, 80) 163920
=================================================================
Total params: 23,751,632
Trainable params: 11,202,640
Non-trainable params: 12,548,992
_________________________________________________________________
The inner ResNet50
model is treated as a layer of model
during weight loading. When loading the layer resnet50
, in Step 1, calling layer.weights
is equivalent to calling base_model.weights
. The list of weight tensors for all layers in the ResNet50
model will be collected and returned.
Now the problem is that, when constructing the list of weight tensors, trainable weights will come before non-trainable weights. In the definition of Layer
class:
@property
def weights(self):
return self.trainable_weights + self.non_trainable_weights
If all layers in base_model
are frozen, the weight tensors will be in the following order:
for layer in base_model.layers:
layer.trainable = False
print(base_model.weights)
[<tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
<tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
...
<tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
<tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]
However, if some layers are trainable, the weight tensors of the trainable layers will come before that of the frozen ones:
for layer in base_model.layers[-5:]:
layer.trainable = True
print(base_model.weights)
[<tf.Variable 'res5c_branch2c/kernel:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
<tf.Variable 'res5c_branch2c/bias:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/gamma:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/beta:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'conv1/kernel:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
<tf.Variable 'conv1/bias:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/gamma:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/beta:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_mean:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'bn_conv1/moving_variance:0' shape=(64,) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/kernel:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
<tf.Variable 'res2a_branch2a/bias:0' shape=(64,) dtype=float32_ref>,
...
<tf.Variable 'bn5c_branch2b/moving_mean:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2b/moving_variance:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_mean:0' shape=(2048,) dtype=float32_ref>,
<tf.Variable 'bn5c_branch2c/moving_variance:0' shape=(2048,) dtype=float32_ref>]
The change in order is why you got an error about tensor shapes. The weight values saved in the hdf5 file are matched to the wrong weight tensors in Step 2 mentioned above. The reason that everything works fine when you freeze all layers is because your model checkpoint is saved also with all layers frozen and thus the order is correct.
You can avoid a nested model by using the functional API. For example, the following code should work without error:
base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
x = Flatten()(base_model.output)
x = Dense(80, activation="softmax")(x)
model = Model(base_model.input, x)
for layer in base_model.layers:
layer.trainable = False
model.save_weights("all_nontrainable.h5")
base_model = ResNet50(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
x = Flatten()(base_model.output)
x = Dense(80, activation="softmax")(x)
model = Model(base_model.input, x)
for layer in base_model.layers[:-26]:
layer.trainable = False
model.load_weights("all_nontrainable.h5")