Can I make pandas cut/qcut function to return with bin endpoint or bin midpoint instead of a string of bin label?
Currently
pd.cut(pd.Series(np.arange(11)), bins = 5)
0 (-0.01, 2]
1 (-0.01, 2]
2 (-0.01, 2]
3 (2, 4]
4 (2, 4]
5 (4, 6]
6 (4, 6]
7 (6, 8]
8 (6, 8]
9 (8, 10]
10 (8, 10]
dtype: category
with category / string values. What I want is
0 1.0
1 1.0
2 1.0
3 3.0
4 3.0
with numerical values representing edge or midpoint of the bin.
I see that this is an old post but I will take the liberty to answer it anyway.
It is now possible (ref @chrisb's answer) to access the endpoints for categorical intervals using left
and right
.
s = pd.cut(pd.Series(np.arange(11)), bins = 5)
mid = [(a.left + a.right)/2 for a in s]
Out[34]: [0.995, 0.995, 0.995, 3.0, 3.0, 5.0, 5.0, 7.0, 7.0, 9.0, 9.0]
Since intervals are open to the left and closed to the right, the 'first' interval (the one starting at 0), actually starts at -0.01. To get a midpoint using 0 as the left value you can do this
mid_alt = [(a.left + a.right)/2 if a.left != -0.01 else a.right/2 for a in s]
Out[35]: [1.0, 1.0, 1.0, 3.0, 3.0, 5.0, 5.0, 7.0, 7.0, 9.0, 9.0]
Or, you can say that the intervals are closed to the left and open to the right
t = pd.cut(pd.Series(np.arange(11)), bins = 5, right=False)
Out[38]:
0 [0.0, 2.0)
1 [0.0, 2.0)
2 [2.0, 4.0)
3 [2.0, 4.0)
4 [4.0, 6.0)
5 [4.0, 6.0)
6 [6.0, 8.0)
7 [6.0, 8.0)
8 [8.0, 10.01)
9 [8.0, 10.01)
10 [8.0, 10.01)
But, as you see, you get the same problem at the last interval.