Recode categorical factor with N categories into N binary columns

Keith Hughitt picture Keith Hughitt · Apr 24, 2013 · Viewed 29k times · Source

Original data frame:

v1 = sample(letters[1:3], 10, replace=TRUE)
v2 = sample(letters[1:3], 10, replace=TRUE)
df = data.frame(v1,v2)
df
   v1 v2
1   b  c
2   a  a
3   c  c
4   b  a
5   c  c
6   c  b
7   a  a
8   a  b
9   a  c
10  a  b

New data frame:

new_df = data.frame(row.names=rownames(df))
for (i in colnames(df)) {
    for (x in letters[1:3]) {
        #new_df[x] = as.numeric(df[i] == x)
        new_df[paste0(i, "_", x)] = as.numeric(df[i] == x)
    }
}
   v1_a v1_b v1_c v2_a v2_b v2_c
1     0    1    0    0    0    1
2     1    0    0    1    0    0
3     0    0    1    0    0    1
4     0    1    0    1    0    0
5     0    0    1    0    0    1
6     0    0    1    0    1    0
7     1    0    0    1    0    0
8     1    0    0    0    1    0
9     1    0    0    0    0    1
10    1    0    0    0    1    0

For small datasets this is fine, but it becomes slow for much larger datasets.

Anyone know of a way to do this without using looping?

Answer

Arun picture Arun · Apr 24, 2013

Even better with the help of @AnandaMahto's search capabilities,

model.matrix(~ . + 0, data=df, contrasts.arg = lapply(df, contrasts, contrasts=FALSE))
#    v1a v1b v1c v2a v2b v2c
# 1    0   1   0   0   0   1
# 2    1   0   0   1   0   0
# 3    0   0   1   0   0   1
# 4    0   1   0   1   0   0
# 5    0   0   1   0   0   1
# 6    0   0   1   0   1   0
# 7    1   0   0   1   0   0
# 8    1   0   0   0   1   0
# 9    1   0   0   0   0   1
# 10   1   0   0   0   1   0

I think this is what you're looking for. I'd be happy to delete if it's not so. Thanks to @G.Grothendieck (once again) for the excellent usage of model.matrix!

cbind(with(df, model.matrix(~ v1 + 0)), with(df, model.matrix(~ v2 + 0)))
#    v1a v1b v1c v2a v2b v2c
# 1    0   1   0   0   0   1
# 2    1   0   0   1   0   0
# 3    0   0   1   0   0   1
# 4    0   1   0   1   0   0
# 5    0   0   1   0   0   1
# 6    0   0   1   0   1   0
# 7    1   0   0   1   0   0
# 8    1   0   0   0   1   0
# 9    1   0   0   0   0   1
# 10   1   0   0   0   1   0

Note: Your output is just:

with(df, model.matrix(~ v2 + 0))

Note 2: This gives a matrix. Fairly obvious, but still, wrap it with as.data.frame(.) if you want a data.frame.