Output a numeric value from 'cut()' in R

r cut
Andrew picture Andrew · Sep 2, 2015 · Viewed 7k times · Source

I've read this question here: Group numeric values by the intervals

However, I would like to output a numeric (rather than a factor), specifically the numeric value of the lower and/or upper bounds (in separate columns)

In essence, this is right, except that the 'df$start' and 'df$end' are given as factors:

df$start <- cut(df$x, 
                breaks = c(0,25,75,125,175,225,299),
                labels = c(0,25,75,125,175,225),
                right = TRUE)

df$end <- cut(df$x, 
              breaks = c(0,25,75,125,175,225,299),
              labels = c(25,75,125,175,225,299),
              right = TRUE)

The use of 'as.numeric()' returns the level of the factor (i.e. values 1-6) rather than the original numbers.

Thanks!

Answer

user295691 picture user295691 · Sep 2, 2015

Much of the behavior of cut is related to creating the labels that you're not interested in. You're probably better off using findInterval or .bincode.

You would start with the data

set.seed(17)
df <- data.frame(x=300 * runif(100))

Then set the breaks and find the intervals:

breaks <- c(0,25,75,125,175,225,299)
df$interval <- findInterval(df$x, breaks)
df$start <- breaks[df$interval]
df$end <- breaks[df$interval + 1]