How to manually specify class labels in keras flow_from_directory?

Malte picture Malte · Mar 29, 2017 · Viewed 11.2k times · Source

Problem: I am training a model for multilabel image recognition. My images are therefore associated with multiple y labels. This is conflicting with the convenient keras method "flow_from_directory" of the ImageDataGenerator, where each image is supposed to be in the folder of the corresponding label (https://keras.io/preprocessing/image/).

Workaround: Currently, I am reading all images into a numpy array and use the "flow" function from there. But this results in heavy memory loads and a slow read-in process.

Question: Is there a way to use the "flow_from_directory" method and to supply manually the (multiple) class labels?


Update: I ended up extending the DirectoryIterator class for the multilabel case. You can now set the attribute "class_mode" to the value "multilabel" and provide a dictionary "multlabel_classes" which maps filenames to their labels. Code: https://github.com/tholor/keras/commit/29ceafca3c4792cb480829c5768510e4bdb489c5

Answer

Marcin Możejko picture Marcin Możejko · Mar 29, 2017

You could simply use the flow_from_directory and extend it to a multiclass in a following manner:

def multiclass_flow_from_directory(flow_from_directory_gen, multiclasses_getter):
    for x, y in flow_from_directory_gen:
        yield x, multiclasses_getter(x, y)

Where multiclasses_getter is assigning a multiclass vector / your multiclass representation to your images. Note that x and y are not a single examples but batches of examples, so this should be included in your multiclasses_getter design.