Problem: I am training a model for multilabel image recognition. My images are therefore associated with multiple y labels. This is conflicting with the convenient keras method "flow_from_directory" of the ImageDataGenerator, where each image is supposed to be in the folder of the corresponding label (https://keras.io/preprocessing/image/).
Workaround: Currently, I am reading all images into a numpy array and use the "flow" function from there. But this results in heavy memory loads and a slow read-in process.
Question: Is there a way to use the "flow_from_directory" method and to supply manually the (multiple) class labels?
Update: I ended up extending the DirectoryIterator class for the multilabel case. You can now set the attribute "class_mode" to the value "multilabel" and provide a dictionary "multlabel_classes" which maps filenames to their labels. Code: https://github.com/tholor/keras/commit/29ceafca3c4792cb480829c5768510e4bdb489c5
You could simply use the flow_from_directory
and extend it to a multiclass in a following manner:
def multiclass_flow_from_directory(flow_from_directory_gen, multiclasses_getter):
for x, y in flow_from_directory_gen:
yield x, multiclasses_getter(x, y)
Where multiclasses_getter
is assigning a multiclass vector / your multiclass representation to your images. Note that x
and y
are not a single examples but batches of examples, so this should be included in your multiclasses_getter
design.