In R, how to collapse categories or recategorize variables?

CCA picture CCA · Jul 16, 2010 · Viewed 34.9k times · Source

I am sure this is a very basic question:

In R I have 600,000 categorical variables - each of which is classified as "0", "1", or "2"

What I would like to do is collapse "1" and "2" and leave "0" by itself, such that after re-categorizing "0" = "0"; "1" = "1" and "2" = "1" --- in the end I only want "0" and "1" as categories for each of the variables.

Also, if possible I would rather not create 600,000 new variables, if I can replace the existing variables with the new values that would be great!

What would be the best way to do this?

Thank you!

Answer

maja zaloznik picture maja zaloznik · Jan 29, 2012

I find this is even more generic using factor(new.levels[x]):

> x <- factor(sample(c("0","1","2"), 10, replace=TRUE)) 
> x
 [1] 0 2 2 2 1 2 2 0 2 1
Levels: 0 1 2
> new.levels<-c(0,1,1)
> x <- factor(new.levels[x])
> x
 [1] 0 1 1 1 1 1 1 0 1 1
Levels: 0 1

The new levels vector must the same length as the number of levels in x, so you can do more complicated recodes as well using strings and NAs for example

x <- factor(c("old", "new", NA)[x])
> x
 [1] old    <NA>   <NA>   <NA>   new <NA>   <NA>   old   
 [9] <NA>   new    
Levels: new old