How do I convert certain columns of a data frame to become factors?

math11 picture math11 · Nov 28, 2012 · Viewed 205.3k times · Source

Possible Duplicate:
identifying or coding unique factors using R

I'm having some trouble with R.

I have a data set similar to the following, but much longer.

A B Pulse
1 2 23
2 2 24
2 2 12
2 3 25
1 1 65
1 3 45

Basically, the first 2 columns are coded. A has 1, 2 which represent 2 different weights. B has 1, 2, 3 which represent 3 different times.

As they are coded numerical values, R will treat them as numerical variables. I need to use the factor function to convert these variables into factors.

Help?

Answer

Jeff Allen picture Jeff Allen · Nov 28, 2012

Here's an example:

#Create a data frame
> d<- data.frame(a=1:3, b=2:4)
> d
  a b
1 1 2
2 2 3
3 3 4

#currently, there are no levels in the `a` column, since it's numeric as you point out.
> levels(d$a)
NULL

#Convert that column to a factor
> d$a <- factor(d$a)
> d
  a b
1 1 2
2 2 3
3 3 4

#Now it has levels.
> levels(d$a)
[1] "1" "2" "3"

You can also handle this when reading in your data. See the colClasses and stringsAsFactors parameters in e.g. readCSV().

Note that, computationally, factoring such columns won't help you much, and may actually slow down your program (albeit negligibly). Using a factor will require that all values are mapped to IDs behind the scenes, so any print of your data.frame requires a lookup on those levels -- an extra step which takes time.

Factors are great when storing strings which you don't want to store repeatedly, but would rather reference by their ID. Consider storing a more friendly name in such columns to fully benefit from factors.