I have a variable in a dataframe where one of the fields typically has 7-8 values. I want to collpase them 3 or 4 new categories within a new variable within the dataframe. What is the best approach?
I would use a CASE statement if I were in a SQL-like tool but not sure how to attack this in R.
Any help you can provide will be much appreciated!
case_when()
, which was added to dplyr in May 2016, solves this problem in a manner similar to memisc::cases()
.
For example:
library(dplyr)
mtcars %>%
mutate(category = case_when(
.$cyl == 4 & .$disp < median(.$disp) ~ "4 cylinders, small displacement",
.$cyl == 8 & .$disp > median(.$disp) ~ "8 cylinders, large displacement",
TRUE ~ "other"
)
)
As of dplyr 0.7.0,
mtcars %>%
mutate(category = case_when(
cyl == 4 & disp < median(disp) ~ "4 cylinders, small displacement",
cyl == 8 & disp > median(disp) ~ "8 cylinders, large displacement",
TRUE ~ "other"
)
)