How to insert Keras model into scikit-learn pipeline?

machinery picture machinery · Feb 23, 2017 · Viewed 14.4k times · Source

I'm using a Scikit-Learn custom pipeline (sklearn.pipeline.Pipeline) in conjunction with RandomizedSearchCV for hyper-parameter optimization. This works great.

Now I would like to insert a Keras model as a first step into the pipeline. Parameters of the model should be optimized. The computed (fitted) Keras model should then be used later on in the pipeline by other steps, so I think I have to store the model as a global variable so that the other pipeline steps can use it. Is this right?

I know that Keras offers some wrappers for the Scikit-Learn API but the problem is that these wrappers already do classification / regression but I only want to compute the Keras model and nothing else.

How can this be done?

For example I have a method which returns the model:

def create_model(file_path, argument2,...):
    ...
    return model

The method needs some fixed parameters like a file path etc. but X and y is not needed (or can be ignored). The parameters of the model should be optimized (number of layers etc.).

Answer

Felipe picture Felipe · Nov 27, 2017

You need to wrap your Keras model as a Scikit learn model first, and then just proceed as normal.

Here's a quick example (I've omitted the imports for brevity)

Here is a full blog post with this one and many other examples: Scikit-learn Pipeline Examples

# create a function that returns a model, taking as parameters things you
# want to verify using cross-valdiation and model selection
def create_model(optimizer='adagrad',
                 kernel_initializer='glorot_uniform', 
                 dropout=0.2):
    model = Sequential()
    model.add(Dense(64,activation='relu',kernel_initializer=kernel_initializer))
    model.add(Dropout(dropout))
    model.add(Dense(1,activation='sigmoid',kernel_initializer=kernel_initializer))

    model.compile(loss='binary_crossentropy',optimizer=optimizer, metrics=['accuracy'])

    return model

# wrap the model using the function you created
clf = KerasRegressor(build_fn=create_model,verbose=0)

# just create the pipeline
pipeline = Pipeline([
    ('clf',clf)
])

pipeline.fit(X_train, y_train)