I am using model.fit_generator to train and get results for my binary (two class) model because I am giving input images directly from my folder. How to get confusion matrix in this case (TP, TN, FP, FN) as well because generally I use confusion_matrix
command of sklearn.metrics
to get it, which requires predicted
, and actual
labels. But here I don't have both. May be I can calculate predicted labels from predict=model.predict_generator(validation_generator)
command. But I don't know how my model is taking input labels from my images. General structure of my input folder is:
train/
class1/
img1.jpg
img2.jpg
........
class2/
IMG1.jpg
IMG2.jpg
test/
class1/
img1.jpg
img2.jpg
........
class2/
IMG1.jpg
IMG2.jpg
........
and some blocks of my code is:
train_generator = train_datagen.flow_from_directory('train',
target_size=(50, 50), batch_size=batch_size,
class_mode='binary',color_mode='grayscale')
validation_generator = test_datagen.flow_from_directory('test',
target_size=(50, 50),batch_size=batch_size,
class_mode='binary',color_mode='grayscale')
model.fit_generator(
train_generator,steps_per_epoch=250 ,epochs=40,
validation_data=validation_generator,
validation_steps=21 )
So the above code automatically takes two class inputs, but I don't know for which it consider class 0 and for which class 1.
I've managed it in the following way, using keras.utils.Sequence
.
from sklearn.metrics import confusion_matrix
from keras.utils import Sequence
class MySequence(Sequence):
def __init__(self, *args, **kwargs):
# initialize
# see manual on implementing methods
def __len__(self):
return self.length
def __getitem__(self, index):
# return index-th complete batch
# create data generator
data_gen = MySequence(evaluation_set, batch_size=10)
n_batches = len(data_gen)
confusion_matrix(
np.concatenate([np.argmax(data_gen[i][1], axis=1) for i in range(n_batches)]),
np.argmax(m.predict_generator(data_gen, steps=n_batches), axis=1)
)
The implemented class returns batches of data in tuples, that allows not to hold all of them in RAM. Please, note that it must be implemented in __getitem__
, and this method must return same batch for the same argument.
Unfortunately this code iterates data twice: first time, it creates array of true answers from returned batches, the second time it calls predict
method of the model.