R: as.numeric function not returning correct # from data.frame

Amanda picture Amanda · Aug 2, 2011 · Viewed 30.8k times · Source

Possible Duplicate:
R - How to convert a factor to an integer\numeric in R without a loss of information

I am importing an excel document using read.xls. I know this command uses read.table and returns everything as "factors". I am unable to upload my data directly telling read.xls which columns are numeric, as all columns have previous categorical data. So I have been extracting my numeric data columns I desire, then wanting to transform them from data.frames to numeric data, however when I use as.numeric I am receiving numbers that do not correspond to the original data.

For example:

These are the first 6 rows of my data.frame called dfA1, which is a 96,1 vector

         [,1]
[1,] "103316"
[2,] "130720"
[3,] "141808"
[4,] "131864"
[5,] "148144"
[6,] "145760"

When I perform as.numeric(dfA1) I receive:

[1]  2  18  29  19  43  40

I have absolutely no idea why I get these numbers or how it could be coming up with them. I checked my original xls document and they are marked as numeric with no decimals.

Answer

joran picture joran · Aug 2, 2011

You can try:

as.numeric(as.character(dfA1))

and you can also prevent things from automatically being converted to factors by setting stringsAsFactors = FALSE using ?options.

The reason this happens is that factors are actually stored internally as integers, and the labels are what is actually displayed when you print them out (things like "103316" in your case). The function as.numeric thinks that what you want is the underlying integer representation.