Numeric variables converted to factors when reading a CSV file

intael picture intael · Nov 19, 2013 · Viewed 40.5k times · Source

I'm trying to read an Excel created .csv file into R where all the vectors (columns in the CSV file) are numeric. However, they get converted to factor everytime I import them.

Here's a sample of how my CSV looks like:

enter image description here

This is my code:

options(StringsAsFactors=F)
data<-read.csv("in.csv",dec=",",sep=";")

As you can see, I take care of setting de dec to "," and the sep to ";". However, all the vectors that should be numerics are factors!

Can someone give me some advice? Thanks!

Answer

Henrik picture Henrik · Nov 19, 2013

Your NA strings in the csv file, N/A, are interpreted as character and then the whole column is converted to character. If you have stringsAsFactors = TRUE in options or in read.csv (default), the column is further converted to factor. You can use the argument na.strings to tell read.csv which strings should be interpreted as NA.

A small example:

df <- read.csv(text = "x;y
                 N/A;2,2
                 3,3;4,4", dec = ",", sep = ";")
str(df)

df <- read.csv(text = "x;y
                 N/A;2,2
                 3,3;4,4", dec = ",", sep = ";", na.strings = "N/A")
str(df)

Update following comment

Although not apparent from the sample data provided, there is also a problem with instances of '$' concatenated to the numbers, e.g. '$3,3'. Such values will be interpreted as character, and then the dec = "," doesn't help us. We need to replace both the '$' and the ',' before the variable is converted to numeric.

df <- read.csv(text = "x;y;z
               N/A;1,1;2,2$
               $3,3;5,5;4,4", dec = ",", sep = ";", na.strings = "N/A")
df
str(df)

df[] <- lapply(df, function(x){
  x2 <- gsub(pattern = "$", replacement = "", x = x, fixed = TRUE)
  x3 <- gsub(pattern = ",", replacement = ".", x = x2, fixed = TRUE)
  as.numeric(x3)
  }
                         )
df
str(df)