Categorize numeric variable with mutate

r dplyr categorization

Marta Karas · Apr 19, 2014 · Viewed 32.1k times · Source

I would like to a categorize numeric variable in my data.frame object with the use of dplyr (and have no idea how to do it).

Without dplyr, I would probably do something like:

df <- data.frame(a = rnorm(1e3), b = rnorm(1e3))
df$a <- cut(df$a , breaks=quantile(df$a, probs = seq(0, 1, 0.2)))

and it would be done. However, I strongly prefer to do it with the use of some dplyr function (mutate, I suppose) in the chain sequence of other actions I do perform over my data.frame.

Answer

set.seed(123)
df <- data.frame(a = rnorm(10), b = rnorm(10))

df %>% mutate(a = cut(a, breaks = quantile(a, probs = seq(0, 1, 0.2))))

giving:

                 a          b
1  (-0.586,-0.316]  1.2240818
2   (-0.316,0.094]  0.3598138
3      (0.68,1.72]  0.4007715
4   (-0.316,0.094]  0.1106827
5     (0.094,0.68] -0.5558411
6      (0.68,1.72]  1.7869131
7     (0.094,0.68]  0.4978505
8             <NA> -1.9666172
9   (-1.27,-0.586]  0.7013559
10 (-0.586,-0.316] -0.4727914

Categorize numeric variable with mutate

Answer

Related questions