I have a scikit-learn pipline with kerasRegressor in it:
estimators = [
('standardize', StandardScaler()),
('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=5, batch_size=1000, verbose=1))
]
pipeline = Pipeline(estimators)
After, training the pipline, I am trying to save to disk using joblib...
joblib.dump(pipeline, filename , compress=9)
But I am getting an error:
RuntimeError: maximum recursion depth exceeded
How would you save the pipeline to disk?
I struggled with the same problem as there are no direct ways to do this. Here is a hack which worked for me. I saved my pipeline into two files. The first file stored a pickled object of the sklearn pipeline and the second one was used to store the Keras model:
...
from keras.models import load_model
from sklearn.externals import joblib
...
pipeline = Pipeline([
('scaler', StandardScaler()),
('estimator', KerasRegressor(build_model))
])
pipeline.fit(X_train, y_train)
# Save the Keras model first:
pipeline.named_steps['estimator'].model.save('keras_model.h5')
# This hack allows us to save the sklearn pipeline:
pipeline.named_steps['estimator'].model = None
# Finally, save the pipeline:
joblib.dump(pipeline, 'sklearn_pipeline.pkl')
del pipeline
And here is how the model could be loaded back:
# Load the pipeline first:
pipeline = joblib.load('sklearn_pipeline.pkl')
# Then, load the Keras model:
pipeline.named_steps['estimator'].model = load_model('keras_model.h5')
y_pred = pipeline.predict(X_test)