I want to transform continuous values of a dataframe column into discrete values by equivalent partioning.
For example, the following is my input
.
I want to divide the continuous value in column a
into 3 intervals.
Input:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1.1, 1.2, 1.3, 2.4, 2.5, 4.1]})
Output:
a
0 1.1
1 1.2
2 1.3
3 2.4
4 2.5
5 4.1
In column a
, the minimum value is 1.1, the maximum value is 4.1
, I want to divide it into 3 intervals
.
As you see, the size of each interval is equal to (4.1-1.1)/3 = 1.0
. So I can regard all the values in the interval of [1.1, 2.1)
(bigger or equal to 1.1
and less than 2.1
) as 0
, all the values in the interval of [2.1, 3.1)
as 1
, and all the values in the interval of [3.1, 4.1]
as 2
.
So here is my expected result.
Expected:
a
0 0
1 0
2 0
3 1
4 1
5 2
You can use pd.cut
with parameter right = False
as:
pd.cut(df.a, bins=3, labels=np.arange(3), right=False)
0 0
1 0
2 0
3 1
4 1
5 2
Name: a, dtype: category
Categories (3, int64): [0 < 1 < 2]
How the binning is done:
pd.cut(df.a, bins=3, right=False)
0 [1.1, 2.1)
1 [1.1, 2.1)
2 [1.1, 2.1)
3 [2.1, 3.1)
4 [2.1, 3.1)
5 [3.1, 4.103)
Name: a, dtype: category
Categories (3, interval[float64]): [[1.1, 2.1) < [2.1, 3.1) < [3.1, 4.103)]