R factor NA vs <NA>

screechOwl picture screechOwl · Jun 14, 2013 · Viewed 8.5k times · Source

I have the following data frame:

df1 <- data.frame(id = 1:20, fact1 = factor(rep(c('abc','def','NA',''),5)))
df1
   id fact1
1   1   abc
2   2   def
3   3    NA
4   4      
5   5   abc
6   6   def
7   7    NA
8   8      
9   9   abc
10 10   def
11 11    NA
12 12      
13 13   abc
14 14   def
15 15    NA
16 16      
17 17   abc
18 18   def
19 19    NA
20 20      

I'm trying to standardize all the missing values ('' and NA's) to become NA's. However when I use this:

df1[df1 == ''] <- NA

there seems to be 2 classes of NA's.

df1
   id fact1
1   1   abc
2   2   def
3   3    NA
4   4  <NA>
5   5   abc
6   6   def
7   7    NA
8   8  <NA>
9   9   abc
10 10   def
11 11    NA
12 12  <NA>
13 13   abc
14 14   def
15 15    NA
16 16  <NA>
17 17   abc
18 18   def
19 19    NA
20 20  <NA>

Is there a best-practices method for dealing with this situation?

Answer

Zach picture Zach · Jun 14, 2013

Expanding on joran's comment:

df1 <- data.frame(id = 1:5, fact1 = factor(c('abc','def', NA, 'NA','')))
> df1
  id fact1
1  1   abc
2  2   def
3  3  <NA>
4  4    NA
5  5      

df1[df1 == '' | df1 == 'NA'] <- NA
> df1
  id fact1
1  1   abc
2  2   def
3  3  <NA>
4  4  <NA>
5  5  <NA>