Below is dataframe df1 of which I want to convert column "V2" from factor format to numeric without changing the current values (0 ; 0 ; 8,5 ; 3).
df1=
V1 V2 V3 X2 X3
4470 2010-03-28 0 A 21.53675 0
4471 2010-03-29 0 A 19.21611 0
4472 2010-03-30 8,5 A 21.54541 0
4473 2010-03-31 3 A NA NA
Since column "V2" is in factor format I first convert it to character format:
df1[,2]=as.character(df1[,2])
Then I try to convert "V2" to numeric format:
df1[,2]=as.numeric(df1[,2])
Leading to this R message:
Warning message: NAs introduced by coercion
And the dataframe below where df[3,2]
has changed into "NA" instead of remaining "8,5"..
V1 V2 V3 X2 X3
4470 2010-03-28 0 A 21.53675 0
4471 2010-03-29 0 A 19.21611 0
4472 2010-03-30 NA A 21.54541 0
4473 2010-03-31 3 A NA NA
It might have to do with the fact that 8,5 is not a whole number. Still I do not know how to solve this problem. Help would be much appreciated!
Try this to replace the comma in your data:
fac<- c( "0" , "0" , "1,5" , "0" , "0" , "8" )
#[1] "0" "0" "1,5" "0" "0" "8"
fac <- as.numeric( sub(",", ".", fac) )
#[1] 0.0 0.0 1.5 0.0 0.0 8.0
More generally converting factors to their underlying values rather than the factor representation:
fac <- as.factor( fac )
as.numeric(fac)
#[1] 1 1 2 1 1 3
as.numeric(as.character(fac))
#[1] 0.0 0.0 1.5 0.0 0.0 8.0
However, this is the canonical way of transforming to original values
as.numeric(levels(fac))[fac]
From the help page ?as.factor
In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).