How to create a new integer column recode
which recodes for an existing column y
in the dataframe df
using dplyr
approaches?
# Generates Random data
df <- data.frame(x = sample(1:100, 50),
y = sample(LETTERS, 50, replace = TRUE),
stringsAsFactors = FALSE)
# Structure of the data
str(df)
# 'data.frame': 50 obs. of 2 variables:
# $ x: int 90 4 33 85 30 19 78 77 7 10 ...
# $ y: chr "N" "B" "P" "W" ...
# Making the character vector as factor variable
df$y <- factor(df$y)
# Structure of the data to llok at the effect of factor creation
str(df)
# 'data.frame': 50 obs. of 2 variables:
# $ x: int 90 4 33 85 30 19 78 77 7 10 ...
# $ y: Factor w/ 23 levels "A","B","C","E",..: 12 2 14 21 12 22 7 1 6 17 ...
# collecting the levels of the factor variable
labs <- levels(df$y)
# Recode the levels to sequential integers
recode <- 1:length(labs)
# Creates the recode dataframe
dfrecode <- data.frame(labs, recode)
# Mapping the recodes to the original data
df$recode <- dfrecode[match(df$y, dfrecode$labs), 'recode']
This code works as expected. But I want to replace this approach with a dplyr or other efficient approaches. I can achieve the same using this approach if I know all the values. But I would like to do this without seeing or explicitly listing the values present in the column
The trick here is that doing as.numeric(factor)
actually returns the levels as integers. So, try this
df <- data.frame(x = sample(1:100, 50),
y = sample(LETTERS, 50, replace = TRUE),
stringsAsFactors = FALSE)
library(dplyr)
dfrecode <- df %>%
mutate(recode = as.numeric(factor(y)))
str(dfrecode)