I would like to a categorize numeric variable in my data.frame
object with the use of dplyr
(and have no idea how to do it).
Without dplyr
, I would probably do something like:
df <- data.frame(a = rnorm(1e3), b = rnorm(1e3))
df$a <- cut(df$a , breaks=quantile(df$a, probs = seq(0, 1, 0.2)))
and it would be done. However, I strongly prefer to do it with the use of some dplyr
function (mutate
, I suppose) in the chain
sequence of other actions I do perform over my data.frame
.
set.seed(123)
df <- data.frame(a = rnorm(10), b = rnorm(10))
df %>% mutate(a = cut(a, breaks = quantile(a, probs = seq(0, 1, 0.2))))
giving:
a b
1 (-0.586,-0.316] 1.2240818
2 (-0.316,0.094] 0.3598138
3 (0.68,1.72] 0.4007715
4 (-0.316,0.094] 0.1106827
5 (0.094,0.68] -0.5558411
6 (0.68,1.72] 1.7869131
7 (0.094,0.68] 0.4978505
8 <NA> -1.9666172
9 (-1.27,-0.586] 0.7013559
10 (-0.586,-0.316] -0.4727914