Is there a way to do something like a cut()
function for binning numeric values in a dplyr
table? I'm working on a large postgres table and can currently either write a case statement in the sql at the outset, or output unaggregated data and apply cut()
. Both have pretty obvious downsides... case statements are not particularly elegant and pulling a large number of records via collect()
not at all efficient.
Just so there's an immediate answer for others arriving here via search engine, the n-breaks form of cut
is now implemented as the ntile
function in dplyr
:
> data.frame(x = c(5, 1, 3, 2, 2, 3)) %>% mutate(bin = ntile(x, 2))
x bin
1 5 2
2 1 1
3 3 2
4 2 1
5 2 1
6 3 2