How to convert factor format to numeric format in R without changing the values?

MB123 picture MB123 · May 2, 2013 · Viewed 60.2k times · Source

Below is dataframe df1 of which I want to convert column "V2" from factor format to numeric without changing the current values (0 ; 0 ; 8,5 ; 3).

df1=

             V1  V2 V3       X2 X3
4470 2010-03-28   0  A 21.53675  0
4471 2010-03-29   0  A 19.21611  0
4472 2010-03-30 8,5  A 21.54541  0
4473 2010-03-31   3  A       NA NA

Since column "V2" is in factor format I first convert it to character format: df1[,2]=as.character(df1[,2])

Then I try to convert "V2" to numeric format:

df1[,2]=as.numeric(df1[,2])

Leading to this R message:

Warning message: NAs introduced by coercion

And the dataframe below where df[3,2] has changed into "NA" instead of remaining "8,5"..

             V1 V2 V3       X2 X3
4470 2010-03-28  0  A 21.53675  0
4471 2010-03-29  0  A 19.21611  0
4472 2010-03-30 NA  A 21.54541  0
4473 2010-03-31  3  A       NA NA 

It might have to do with the fact that 8,5 is not a whole number. Still I do not know how to solve this problem. Help would be much appreciated!

Answer

Simon O'Hanlon picture Simon O'Hanlon · May 2, 2013

Try this to replace the comma in your data:

fac<- c( "0" , "0" , "1,5" , "0" , "0" , "8" )
#[1] "0"   "0"   "1,5" "0"   "0"   "8" 
fac <- as.numeric( sub(",", ".", fac) )
#[1] 0.0 0.0 1.5 0.0 0.0 8.0

More generally converting factors to their underlying values rather than the factor representation:

fac <- as.factor( fac )
as.numeric(fac)
#[1] 1 1 2 1 1 3
as.numeric(as.character(fac))
#[1] 0.0 0.0 1.5 0.0 0.0 8.0

However, this is the canonical way of transforming to original values

 as.numeric(levels(fac))[fac]

From the help page ?as.factor

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).