name 'DataFrameSelector' is not defined

Isaac A picture Isaac A · Jan 28, 2018 · Viewed 9.4k times · Source

I am currently reading the "Hands-On Machine Learning with Scikit-Learn & TensorFlow". I get an error when I am trying to recreate the Transformation Pipelines code. How can I fix this?

Code:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

num_pipeline = Pipeline([('imputer', Imputer(strategy = "median")),
                        ('attribs_adder', CombinedAttributesAdder()),
                        ('std_scaler', StandardScaler()),
                        ])

housing_num_tr = num_pipeline.fit_transform(housing_num)

from sklearn.pipeline import FeatureUnion

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

num_pipeline = Pipeline([
                         ('selector', DataFrameSelector(num_attribs)),
                         ('imputer', Imputer(strategy = "median")),
                         ('attribs_adder', CombinedAttributesAdder()),
                         ('std_scaler', StandardScaler()),
                        ])

cat_pipeline = Pipeline([('selector', DataFrameSelector(cat_attribs)), 
                         ('label_binarizer', LabelBinarizer()),
                        ])

full_pipeline = FeatureUnion(transformer_list = [("num_pipeline", num_pipeline), 
                                                 ("cat_pipeline", cat_pipeline),
                                                ])

# And we can now run the whole pipeline simply:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared

Error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-350-3a4a39e5bc1c> in <module>()
     43 
     44 num_pipeline = Pipeline([
---> 45                          ('selector', DataFrameSelector(num_attribs)),
     46                          ('imputer', Imputer(strategy = "median")),
     47                          ('attribs_adder', CombinedAttributesAdder()),

NameError: name 'DataFrameSelector' is not defined

Answer

Stephen Rauch picture Stephen Rauch · Jan 28, 2018

DataFrameSelector is not being found, and will need to be imported. It is not part of sklearn, but something of the same name is available in sklearn-features:

from sklearn_features.transformers import DataFrameSelector

(DOCS)