fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

Viral Parmar picture Viral Parmar · Sep 11, 2017 · Viewed 29.3k times · Source

I am totally new to Machine Learning and I have been working with unsupervised learning technique.

Image shows my sample Data(After all Cleaning) Screenshot : Sample Data

I have this two Pipline built to Clean the Data:

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

print(type(num_attribs))

num_pipeline = Pipeline([
    ('selector', DataFrameSelector(num_attribs)),
    ('imputer', Imputer(strategy="median")),
    ('attribs_adder', CombinedAttributesAdder()),
    ('std_scaler', StandardScaler()),
])

cat_pipeline = Pipeline([
    ('selector', DataFrameSelector(cat_attribs)),
    ('label_binarizer', LabelBinarizer())
])

Then I did the union of this two pipelines and the code for the same is shown below :

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[
        ("num_pipeline", num_pipeline),
        ("cat_pipeline", cat_pipeline),
    ])

Now I am trying to do fit_transform on the Data But Its showing Me the Error.

Code for Transformation:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared

Error message:

fit_transform() takes 2 positional arguments but 3 were given

Answer

Zaid E. picture Zaid E. · Oct 7, 2017

The Problem:

The pipeline is assuming LabelBinarizer's fit_transform method is defined to take three positional arguments:

def fit_transform(self, x, y)
    ...rest of the code

while it is defined to take only two:

def fit_transform(self, x):
    ...rest of the code

Possible Solution:

This can be solved by making a custom transformer that can handle 3 positional arguments:

  1. Import and make a new class:

    from sklearn.base import TransformerMixin #gives fit_transform method for free
    class MyLabelBinarizer(TransformerMixin):
        def __init__(self, *args, **kwargs):
            self.encoder = LabelBinarizer(*args, **kwargs)
        def fit(self, x, y=0):
            self.encoder.fit(x)
            return self
        def transform(self, x, y=0):
            return self.encoder.transform(x)
    
  2. Keep your code the same only instead of using LabelBinarizer(), use the class we created : MyLabelBinarizer().


Note: If you want access to LabelBinarizer Attributes (e.g. classes_), add the following line to the fit method:

    self.classes_, self.y_type_, self.sparse_input_ = self.encoder.classes_, self.encoder.y_type_, self.encoder.sparse_input_