I am working on a ML algorithm in which I tried to convert the continuous target values into small bins to understand the problem better. Hence to make better prediction. My original problem is for regression but I convert into classification by making small bins with labels.
I did as follow,
from sklearn.preprocessing import KBinsDiscretizer
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
s = est.fit(target)
Xt = est.transform(s)
It shows a value error like below. Then I reshaped my data into 2D. yet I could not solve it.
ValueError: Expected 2D array, got 1D array instead:
from sklearn.preprocessing import KBinsDiscretizer
myData = pd.read_csv("train.csv", delimiter=",")
target = myData.iloc[:,-5] # this is a continuous data which must be
# converted into bins with a new column.
xx = target.values.reshape(21263,1)
est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
s = est.fit(xx)
Xt = est.transform(s)
You can see my target has 21263 rows. I have to divide these into 10 equal bins and write it into a a new column in my dataframe. Thanks for the guidance.
P.S.:
Max target value:185.0
Min target value:0.00021
Okay I was able to solve it. In any case I post the answer if anyone else need this in the future. I used pandas.qcut
target['Temp_class'] = pd.qcut(target['Temeratue'], 10, labels=False)
This has solved my problem.