Multiple condition if-else using dplyr, custom function, or purr

KyleF picture KyleF · Aug 26, 2018 · Viewed 7.7k times · Source

I have a data frame that has similar structure to the following:

set.seed(123)  
df<-data_frame(SectionName = rep(letters[1:2], 50),
               TimeSpentSeconds = sample(0:360, 100, replace = TRUE),
               Correct = sample(0:1, 100, replace = TRUE))

I want summarise this data frame by taking all values of TimeSpentSeconds that fall into certain ranges (less than 30, between 30-60, between 60-90, ..., greater than 180), label the times as those ranges, group them by SectionName, and find the sum of the Correct column so that the resulting data frame looks (something) like this:

    TimeGroup             SectionName Correct
   <fct>                 <chr>         <int>
 1 LessThan30Secs        a                 2
 2 LessThan30Secs        b                 3
 3 30-60 Seconds         a                 4
 4 30-60 Seconds         b                 3
 5 60-90 Seconds         a                 2
 6 60-90 Seconds         b                 3
 7 90-120 Seconds        a                 4
 8 90-120 Seconds        b                 0
 9 120-150 Seconds       a                 4
10 120-150 Seconds       b                 0
11 150-180 Seconds       a                 1
12 150-180 Seconds       b                 2
13 GreaterThan180Seconds a                11
14 GreaterThan180Seconds b                11

I was able to successfully do this with the following if-else code where I mutated all of the times into a new column with the appropriate label, grouped, and summarised:

x <- c("LessThan30Secs", "30-60 Seconds", "60-90 Seconds","90-120 Seconds", 
           "120-150 Seconds", "150-180 Seconds", "GreaterThan180Seconds") 

df %>% 
mutate(TimeGroup = if_else(TimeSpentSeconds >= 0 & TimeSpentSeconds <= 30, "LessThan30Secs",
                if_else(TimeSpentSeconds > 30 & TimeSpentSeconds <= 60, "30-60 Seconds",
                if_else(TimeSpentSeconds > 60 & TimeSpentSeconds <= 90, "60-90 Seconds",
                if_else(TimeSpentSeconds > 90 & TimeSpentSeconds <= 120, "90-120 Seconds",
                if_else(TimeSpentSeconds > 120 & TimeSpentSeconds <= 150, "120-150 Seconds", 
                if_else(TimeSpentSeconds > 150 & TimeSpentSeconds <= 180, "150-180 Seconds",
                if_else(TimeSpentSeconds > 180, "GreaterThan180Seconds", "")))))))) %>%
    mutate(TimeGroup = factor(TimeGroup, levels = x)) %>%
    arrange(TimeGroup) %>%
    group_by(TimeGroup, SectionName) %>%
    summarise(Correct = sum(Correct))

But, there just has to be a better way to do this. I considered writing a function, but didn't get very far as I'm not great at function writing.

Does anyone have any ideas on a more elegant way to accomplish this same output through a dplyr method I didn't think of, writing a custom function maybe utilizing the purrr package at some point, or some other r function?

Answer

Paul picture Paul · Aug 27, 2018

case_when() will do what you want. Its a tidy alternative to nested ifelse() statements.

library(dplyr)

mutate(df,
       TimeGroup = case_when(
         TimeSpentSeconds >= 0 & TimeSpentSeconds <= 30 ~ "Less Than 30 Secs",
         TimeSpentSeconds > 30 & TimeSpentSeconds <= 60 ~ "30-60 Seconds",
         TimeSpentSeconds > 60 & TimeSpentSeconds <= 90 ~ "60-90 Seconds",
         TimeSpentSeconds > 90 & TimeSpentSeconds <= 120 ~ "90-120 Seconds",
         TimeSpentSeconds > 120 & TimeSpentSeconds <= 150 ~ "120-150 Seconds", 
         TimeSpentSeconds > 150 & TimeSpentSeconds <= 180 ~ "150-180 Seconds",
         TimeSpentSeconds > 180 ~ "Greater Than 180 Seconds",
         TRUE ~ NA_character_)
)

The last argument is a catch all for records that don't fit any of the criteria, such as if time is somehow less than 0 seconds.