Change stringsAsFactors settings for data.frame

VincentH picture VincentH · Jul 18, 2012 · Viewed 62.8k times · Source

I have a function in which I define a data.frame that I use loops to fill with data. At some point I get the Warning message:

Warning messages: 1: In [<-.factor(*tmp*, iseq, value = "CHANGE") : invalid factor level, NAs generated

Therefore, when I define my data.frame, I'd like to set the option stringsAsFactors to FALSE but I don't understand how to do it.

I have tried:

DataFrame = data.frame(stringsAsFactors=FALSE)

and also:

options(stringsAsFactors=FALSE)

What is the correct way to set the stringsAsFactors option?

Answer

MvG picture MvG · Jul 18, 2012

It depends on how you fill your data frame, for which you haven't given any code. When you construct a new data frame, you can do it like this:

x <- data.frame(aName = aVector, bName = bVector, stringsAsFactors = FALSE)

In this case, if e.g. aVector is a character vector, then the dataframe column x$aName will be a character vector as well, and not a factor vector. Combining that with an existing data frame (using rbind, cbind or similar) should preserve that mode.

When you execute

options(stringsAsFactors = FALSE)

you change the global default setting. So every data frame you create after executing that line will not auto-convert to factors unless explicitly told to do so. If you only need to avoid conversion in a single place, then I'd rather not change the default. However if this affects many places in your code, changing the default seems like a good idea.

One more thing: if your vector already contains factors, then neither of the above will change it back into a character vector. To do so, you should explicitly convert it back using as.character or similar.