Best way to combine probabilistic classifiers in scikit-learn

user1507844 picture user1507844 · Feb 2, 2014 · Viewed 17.8k times · Source

I have a logistic regression and a random forest and I'd like to combine them (ensemble) for the final classification probability calculation by taking an average.

Is there a built-in way to do this in sci-kit learn? Some way where I can use the ensemble of the two as a classifier itself? Or would I need to roll my own classifier?

Answer

user1507844 picture user1507844 · Feb 4, 2014

NOTE: The scikit-learn Voting Classifier is probably the best way to do this now


OLD ANSWER:

For what it's worth I ended up doing this as follows:

class EnsembleClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, classifiers=None):
        self.classifiers = classifiers

    def fit(self, X, y):
        for classifier in self.classifiers:
            classifier.fit(X, y)

    def predict_proba(self, X):
        self.predictions_ = list()
        for classifier in self.classifiers:
            self.predictions_.append(classifier.predict_proba(X))
        return np.mean(self.predictions_, axis=0)