How to use KBinsDiscretizer to make continuous data into bins in Sklearn?

Mass17 picture Mass17 · Dec 28, 2018 · Viewed 7.5k times · Source

I am working on a ML algorithm in which I tried to convert the continuous target values into small bins to understand the problem better. Hence to make better prediction. My original problem is for regression but I convert into classification by making small bins with labels.

I did as follow,

from sklearn.preprocessing import KBinsDiscretizer  
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
s = est.fit(target) 
Xt = est.transform(s)

It shows a value error like below. Then I reshaped my data into 2D. yet I could not solve it.

ValueError: Expected 2D array, got 1D array instead:

from sklearn.preprocessing import KBinsDiscretizer

myData = pd.read_csv("train.csv", delimiter=",")
target = myData.iloc[:,-5]  # this is a continuous data which must be 
                        # converted into bins with a new column.

xx = target.values.reshape(21263,1)

est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
s = est.fit(xx) 
Xt = est.transform(s)

You can see my target has 21263 rows. I have to divide these into 10 equal bins and write it into a a new column in my dataframe. Thanks for the guidance.

P.S.: Max target value:185.0
Min target value:0.00021

Answer

Mass17 picture Mass17 · Dec 28, 2018

Okay I was able to solve it. In any case I post the answer if anyone else need this in the future. I used pandas.qcut

target['Temp_class'] = pd.qcut(target['Temeratue'], 10, labels=False)

This has solved my problem.